Hey :wave: We just hit this internal serve error ...
# ask-community
j
Hey ๐Ÿ‘‹ We just hit this internal serve error on one of our scheduled runs. We have hit restart, but wondering whether there is any further context that can be provided?
Copy code
Failed to set task state with error: ClientError([{'path': ['set_task_run_states'], 'message': 'An unknown error occurred.', 'extensions': {'code': 'INTERNAL_SERVER_ERROR'}}])
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/cloud/task_runner.py", line 91, in call_runner_target_handlers
    state = self.client.set_task_run_state(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 1917, in set_task_run_state
    result = self.graphql(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 569, in graphql
    raise ClientError(result["errors"])
prefect.exceptions.ClientError: [{'path': ['set_task_run_states'], 'message': 'An unknown error occurred.', 'extensions': {'code': 'INTERNAL_SERVER_ERROR'}}]
I've put the task and flow runs urls in the ๐Ÿงต .
k
Is this a big Flow in terms of number of tasks? Did the restart work?
j
Very big flow, it will likely take a couple of hours for us to hit the problem tasks from the last run; although I'm assuming it's not those tasks that were the problem, but rather something on the Prefect Cloud side?
d
For context, yeah it's a massive flow, circa 1400 statically-defined tasks, ~2700 edges. We've had issues with registering the large static DAG in the past, but rarely cloud server 500 errors, wondering if you can see anything from your side in the logs?
k
I remember that but I think that should only be registration side and I think we should be past that. The only logs I see will be what you see. I do see the error. I would need to look for some other team members to dig deeper. Will get back.
๐Ÿ™ 1
Wait sorry it still says running. Did just one task fail but the Flow continued (apart from that error)?
j
I have cancelled the problem flow-run, and started a new one. The problem run, I restarted it from a given node in the DAG, but it didn't have expected behaviour (that node was pending, and downstream nodes were running), which resulted in my cancelling altogether.
k
Ok gotcha
Will DM