Is there any reason why flow runs would suddenly not be able Prefect Community #ask-community

Is there any reason why flow runs would suddenly n...

will milner

05/25/2022, 5:18 PM

Is there any reason why flow runs would suddenly not be able to be scheduled? I didn't make any updates at all to my server or agent and since Thursday I'm not able to run any flows. I'm using a kubernetes agent. I have no idea why this started happening or how to go about fixing this

Kevin Kho

05/25/2022, 5:20 PM

Hi Will, could we move the tracebacks to the thread to keep the main channel cleaner? You are on Prefect Cloud right? (Not Server)

will milner

05/25/2022, 5:20 PM

Inspecting the kubernetes agent I see

Copy code

ERROR:agent:Error attempting to set flow run state for 16ab885e-6b95-43cd-9214-38e13f18fde5: [{'message': 'State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5', 'locations': [{'line': 2, 'column': 5}], 'path': ['set_flow_run_states'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5'}}}]

Inspecting the graphql logs I do see this

Copy code

State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5

GraphQL request:2:3
1 | mutation ($input: set_flow_run_states_input!) {
2 |   set_flow_run_states(input: $input) {
  |   ^
3 |     states {
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 628, in await_result
    return await result
  File "/prefect-server/src/prefect_server/graphql/extensions.py", line 52, in resolve
    result = await result
  File "/prefect-server/src/prefect_server/graphql/states.py", line 45, in resolve_set_flow_run_states
    *[check_size_and_set_state(state_input) for state_input in input["states"]]
  File "/prefect-server/src/prefect_server/graphql/states.py", line 39, in check_size_and_set_state
    agent_id=agent_id,
  File "/prefect-server/src/prefect_server/api/states.py", line 53, in set_flow_run_state
    raise ValueError(f"State update failed for flow run ID {flow_run_id}")
ValueError: State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 674, in await_completed
    return await completed
  File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 659, in await_result
    return_type, field_nodes, info, path, await result
  File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 733, in complete_value
    raise result
  File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 628, in await_result
    return await result
  File "/prefect-server/src/prefect_server/graphql/extensions.py", line 52, in resolve
    result = await result
  File "/prefect-server/src/prefect_server/graphql/states.py", line 45, in resolve_set_flow_run_states
    *[check_size_and_set_state(state_input) for state_input in input["states"]]
  File "/prefect-server/src/prefect_server/graphql/states.py", line 39, in check_size_and_set_state
    agent_id=agent_id,
  File "/prefect-server/src/prefect_server/api/states.py", line 53, in set_flow_run_state
    raise ValueError(f"State update failed for flow run ID {flow_run_id}")
graphql.error.graphql_error.GraphQLError: State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5

GraphQL request:2:3
1 | mutation ($input: set_flow_run_states_input!) {
2 |   set_flow_run_states(input: $input) {
  |   ^
3 |     states {

will milner

05/25/2022, 5:20 PM

I am on server

Kevin Kho

05/25/2022, 5:22 PM

Thanks for moving it. If I have to guess, your API pod is down or your DB is full, or something like that. If this is happening for all flows, I think this is likely to be the case. Does starting an agent lead to errors also?

will milner

05/25/2022, 5:22 PM

i'm able to start a new Docker agent, I can try starting another kubernetes agent

will milner

05/25/2022, 5:24 PM

ah, I found the issue. There were too many pods stuck on my cluster which was not allowing more pods to be scheduled. After I removed the pods that were stuck, new ones are able to be started now

Kevin Kho

05/25/2022, 5:25 PM

Ah ok glad you figured it out

3 Views

Open in Slack

Previous Next