https://prefect.io logo
Title
m

Miremad Aghili

03/17/2023, 2:33 PM
Hi guys, I am trying to set my prefect flow to use temporary dask clusters in GCP but it fails most of the time and the latest error is
prefect.exceptions.ClientError: [{'path': ['set_task_run_states'], 'message': 'State update failed for task run ID e18b41ed-7ce8-449f-bda2-6de3956eb555: provided a running state but associated flow run 8484f5dc-acef-4ac5-ae08-9bf6d72edcf2 is not in a running state.', 'extensions': {'code': 'INTERNAL_SERVER_ERROR'}}]
cloud you guys help me understand what causes this issue?
b

Bianca Hoch

03/17/2023, 5:49 PM
Hey Miremad, I think what happened here is that the flow entered a
Failed
state and the process that was running this task attempted to keep running it after the fact. Looking at the logs and timestamps, I see the following: The flow fails with this message:
2023-03-17T04:38:14.975414+00:00 Unexpected error: <HttpError 404 when requesting <https://compute.googleapis.com/compute/v1/projects/chemix-ai-qa/zones/us-west1-a/instances/dask-68d8ddad-worker-e3ad8d71?alt=json> returned "The resource 'projects/chemix-ai-qa/zones/us-west1-a/instances/dask-68d8ddad-worker-e3ad8d71' was not found". Details: "[{'message': "The resource 'projects/chemix-ai-qa/zones/us-west1-a/instances/dask-68d8ddad-worker-e3ad8d71' was not found", 'domain': 'global', 'reason': 'notFound'}]">
Then, the task from the message you shared throws this error:
2023-03-17T04:38:16.668754+00:00 Task 'train_and_predict_on_fold[3]': Finished task run for task with final state: 'ClientFailed'
Followed by
2023-03-17T04:38:16.648918+00:00 Failed to set task state with error: ClientError([{'path': ['set_task_run_states'], 'message': 'State update failed for task run ID e18b41ed-7ce8-449f-bda2-6de3956eb555: provided a running state but associated flow run 8484f5dc-acef-4ac5-ae08-9bf6d72edcf2 is not in a running state.', 'extensions': {'code': 'INTERNAL_SERVER_ERROR'}}]) Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/prefect/engine/cloud/task_runner.py", line 91, in call_runner_target_handlers state = self.client.set_task_run_state( File "/usr/local/lib/python3.10/site-packages/prefect/client/client.py", line 1622, in set_task_run_state result = self.graphql( File "/usr/local/lib/python3.10/site-packages/prefect/client/client.py", line 465, in graphql raise ClientError(result["errors"]) prefect.exceptions.ClientError: [{'path': ['set_task_run_states'], 'message': 'State update failed for task run ID e18b41ed-7ce8-449f-bda2-6de3956eb555: provided a running state but associated flow run 8484f5dc-acef-4ac5-ae08-9bf6d72edcf2 is not in a running state.', 'extensions': {'code': 'INTERNAL_SERVER_ERROR'}}]
m

Miremad Aghili

03/17/2023, 5:51 PM
Hi Bianca, thanks for your help. but why would that enter a failed state for worker not found. shouldn't it be following what workers are around and what workers are dead?