Seeing an internal server error I haven't seen bef...
# prefect-ui
a
Seeing an internal server error I haven't seen before when running my flow related to a missing UUID. I'm running this using a local agent with DaskCloudProvider on ECS
Copy code
Failed to retrieve task state with error: ClientError([{'path': ['get_or_create_task_run_info'], 'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'locations': [{'line': 2, 'column': 101}], 'path': None}}}])
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/prefect/engine/cloud/task_runner.py", line 154, in initialize_run
    task_run_info = self.client.get_task_run_info(
  File "/usr/local/lib/python3.8/dist-packages/prefect/client/client.py", line 1399, in get_task_run_info
    result = self.graphql(mutation)  # type: Any
  File "/usr/local/lib/python3.8/dist-packages/prefect/client/client.py", line 319, in graphql
    raise ClientError(result["errors"])
prefect.utilities.exceptions.ClientError: [{'path': ['get_or_create_task_run_info'], 'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'locations': [{'line': 2, 'column': 101}], 'path': None}}}]
k
Hey @Andrew Hannigan! Is your flow code too large to share?
a
It is unfortunately
But it seems to be a error generated prefect cloud side because it's coming from the graphql client
k
A quick search in the slack showed this error came when someone defined their flow twice in the and script when they registered
Could you check that all looks right?
a
Definitely only getting one flow url output back after the script runs
k
Or just post the Flow block? No need for all the tasks.
a
Let me try deleting all the archived flows and re registering
Hm this time when reregistering I got this error
prefect.utilities.exceptions.ClientError: [{'path': ['create_flow_from_compressed_string'], 'message': 'Unable to complete operation. An internal API error occurred.', 'extensions': {'code': 'API_ERROR'}}]
k
But you were able to register it previously?
a
Yes
I did 15 minutes ago successfully
Okay tried again and it worked
Rerunning flow now
👀 1
Flow is running fine now 🤷‍♀️
🤦‍♂️ 1
k
Well glad it’s running…I suppose just ping again if a similar issue comes up. 😅
a
Only change was removing the archived flows, it's a pretty large flow so maybe that had something to do with it
will do - thanks!@
k
That could well be the case
a
@Kevin Kho Hmm happening again
Copy code
Failed to retrieve task state with error: ClientError([{'path': ['get_or_create_task_run_info'], 'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'locations': [{'line': 2, 'column': 101}], 'path': None}}}])
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/prefect/engine/cloud/task_runner.py", line 154, in initialize_run
    task_run_info = self.client.get_task_run_info(
  File "/usr/local/lib/python3.8/dist-packages/prefect/client/client.py", line 1399, in get_task_run_info
    result = self.graphql(mutation)  # type: Any
  File "/usr/local/lib/python3.8/dist-packages/prefect/client/client.py", line 319, in graphql
    raise ClientError(result["errors"])
prefect.utilities.exceptions.ClientError: [{'path': ['get_or_create_task_run_info'], 'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'Expected type UUID!, found ""; Could not parse UUID: ', 'locations': [{'line': 2, 'column': 101}], 'path': None}}}]
Any chance we could get more color on the cause of the INTERNAL_SERVER_ERROR?
k
Yeah this is beyond me so I’ll ask the team. I may have to get back to you tomorrow for this,
z
Hey @Andrew Hannigan, a couple pieces of info might be helpful in tracking this down. • What version of Prefect Core are you using? • Are you using Cloud or Server? • How are you registering the flow? (e.g.
flow.register()
, cli command, etc) • How are you running the flow?
a
Hi @Zach Angell : • 0.14.17 • Cloud • flow.register() • Running via prefect cloud on a local agent. Local agent spins up a dask cluster using ECSCluster class from DaskCloudProvider. All of it runs on ECS.
👍 1
z
Thanks for the additional info! I'll do some digging today. Are you still seeing intermittent issues?
A few more questions • Is the flow running on a schedule? If so, how is the schedule set? • What
storage
are you using for the flow?
a
@Zach Angell Flow was triggered manually when the error occurree, and I am using s3storage
@Zach Angell Just following up here to see if we made any progress on this issue? If I can provide any other info please let me know.
z
@Andrew Hannigan we're still looking into this issue on our side, I'll keep you up to date. If you do see the error again, could you DM the link to the flow run?
a
Yes will do
z
Thank you!
a
@Zach Angell I have a run that returned the same error again: “brown-koel”
z
Quick update, it seems like one of the following may be happening • this line doesn’t have any task runs to iterate over: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/engine/cloud/flow_runner.py#L395 • OR context is being wiped somewhere in between the main client and the dask worker that runs the task Any chance you're doing context manipulation in your flow?
a
I'm not doing any context manipulation in the flow. These are the only lines in my code that touch the context, and they are just getters:
Copy code
timestamp = prefect.context.get('scheduled_start_time')
    flow_run_id = prefect.context.get('flow_run_id')
I had set dask_cloudprovider explicitly to 2021.1.1, but my dask version was set to latest. I wonder if dask_cloudprovider expecting an older version of dask is part of problem, would that be consistent with what you are seeing?
Also the issue goes away when use a local dask cluster
z
Hmmm that might have something to do with it. Can you replicate somewhat consistently with a remote dask cluster? And, to confirm, you've never seen the issue with a local dask cluster?
a
Nope never with a local dask cluster
z
Got it. That definitely helps narrow things down. I'll do more testing today specifically with remote dask clusters. Any chance you could share part or all of your
DaskExecutor
configuration for the flow? e.g.
Copy code
flow.executor = DaskExecutor(
    cluster_class="dask_cloudprovider.aws.FargateCluster",
    cluster_kwargs={"n_workers": 4, "image": "my-prefect-image"},
)