https://prefect.io logo
Title
b

Ben Ayers-Glassey

09/27/2022, 2:48 AM
Hello, I have seen an interesting (to me) error: my code attempts to get the value of a secret, basically like this:
secret = prefect.client.secrets.Secret(SFTP_PASSWORD_SECRET_NAME)
    return secret.get()
...and it fails with a
prefect.exceptions.ClientError
from the GraphQL API.
Task 'download_shapefile_from_sftp[1684]': Exception encountered during task execution!
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/client/secrets.py", line 140, in get
    value = secrets[self.name]
KeyError: 'ZESTY_REGRID_SFTP_PASSWORD'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/task_runner.py", line 880, in get_task_run_state
    value = prefect.utilities.executors.run_task_with_timeout(
  File "/usr/local/lib/python3.8/site-packages/prefect/utilities/executors.py", line 468, in run_task_with_timeout
    return task.run(*args, **kwargs)  # type: ignore
  File "/home/zesty_bag/repos/zestyai/data-ingestion/flows/parcels_regrid_sftp/parcels_regrid_sftp.py", line 288, in download_shapefile_from_sftp
  File "/project/regrid.py", line 49, in get_sftp_client
    password = get_sftp_password()
  File "/project/regrid.py", line 29, in get_sftp_password
    return secret.get()
  File "/usr/local/lib/python3.8/site-packages/prefect/client/secrets.py", line 164, in get
    raise exc
  File "/usr/local/lib/python3.8/site-packages/prefect/client/secrets.py", line 148, in get
    result = self.client.graphql(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 464, in graphql
    raise ClientError(result["errors"])
prefect.exceptions.ClientError: [{'path': ['secret_value'], 'message': 'An unknown error occurred.', 'extensions': {'code': 'INTERNAL_SERVER_ERROR'}}]
This is a mapped task with thousands of child tasks, and it has been using a lot of memory, so perhaps that's the reason somehow. But that seems strange, because these errors were happening long before the machine ran out of memory entirely. FWIW, I've seen a variant, where one task logged an almost-identical stack trace, except the error contained "API_ERROR" instead of "INTERNAL_SERVER_ERROR":
prefect.exceptions.ClientError: [{'path': ['secret_value'], 'message': 'Unable to complete operation. An internal API error occurred.', 'extensions': {'code': 'API_ERROR'}}]
Any ideas for why this might be happening?.. One thought I had was API throttling, like maybe my mapped task is generating so many child tasks (3,500 or so) which are all attempting to grab the same secret, so the GraphQL "secret_value" endpoint is throttling me?.. But I'm using a LocalDaskExecutor limited to 5 workers, and these tasks are taking at least a few seconds each, so it's not like I'm hammering the API many times every second, or anything. 🤔
Ah!.. in fact, I see this has also happened with another flow (which doesn't use much memory at all)! We're using VertexAgent, so each flow run is spun off as a separate Vertex.ai job. So the memory-intensive flow can't have been affecting the other's memory usage. So I think it must be something like throttling (or... an unrelated temporary API issue?..).
Wow, actually I see this same issue with getting a secret's value popping up last Saturday. So this is just looking like... an intermittent (?) issue with getting secrets from Prefect Cloud??
b

Bianca Hoch

09/27/2022, 1:55 PM
Hey Ben! You're spot on in your assessment. We do have on ongoing investigation into the issues revolving around retrieving secrets. Here is our status page, which we are updating in in real time during this process.
b

Ben Ayers-Glassey

09/27/2022, 9:45 PM
Ah phew, so it's not just us! Ok, I'll check the status page. Thank you!
1