https://prefect.io logo
#prefect-community
Title
# prefect-community
w

will milner

05/25/2022, 5:18 PM
Is there any reason why flow runs would suddenly not be able to be scheduled? I didn't make any updates at all to my server or agent and since Thursday I'm not able to run any flows. I'm using a kubernetes agent. I have no idea why this started happening or how to go about fixing this
k

Kevin Kho

05/25/2022, 5:20 PM
Hi Will, could we move the tracebacks to the thread to keep the main channel cleaner? You are on Prefect Cloud right? (Not Server)
w

will milner

05/25/2022, 5:20 PM
Inspecting the kubernetes agent I see
Copy code
ERROR:agent:Error attempting to set flow run state for 16ab885e-6b95-43cd-9214-38e13f18fde5: [{'message': 'State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5', 'locations': [{'line': 2, 'column': 5}], 'path': ['set_flow_run_states'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5'}}}]
Inspecting the graphql logs I do see this
Copy code
State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5

GraphQL request:2:3
1 | mutation ($input: set_flow_run_states_input!) {
2 |   set_flow_run_states(input: $input) {
  |   ^
3 |     states {
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 628, in await_result
    return await result
  File "/prefect-server/src/prefect_server/graphql/extensions.py", line 52, in resolve
    result = await result
  File "/prefect-server/src/prefect_server/graphql/states.py", line 45, in resolve_set_flow_run_states
    *[check_size_and_set_state(state_input) for state_input in input["states"]]
  File "/prefect-server/src/prefect_server/graphql/states.py", line 39, in check_size_and_set_state
    agent_id=agent_id,
  File "/prefect-server/src/prefect_server/api/states.py", line 53, in set_flow_run_state
    raise ValueError(f"State update failed for flow run ID {flow_run_id}")
ValueError: State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 674, in await_completed
    return await completed
  File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 659, in await_result
    return_type, field_nodes, info, path, await result
  File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 733, in complete_value
    raise result
  File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 628, in await_result
    return await result
  File "/prefect-server/src/prefect_server/graphql/extensions.py", line 52, in resolve
    result = await result
  File "/prefect-server/src/prefect_server/graphql/states.py", line 45, in resolve_set_flow_run_states
    *[check_size_and_set_state(state_input) for state_input in input["states"]]
  File "/prefect-server/src/prefect_server/graphql/states.py", line 39, in check_size_and_set_state
    agent_id=agent_id,
  File "/prefect-server/src/prefect_server/api/states.py", line 53, in set_flow_run_state
    raise ValueError(f"State update failed for flow run ID {flow_run_id}")
graphql.error.graphql_error.GraphQLError: State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5

GraphQL request:2:3
1 | mutation ($input: set_flow_run_states_input!) {
2 |   set_flow_run_states(input: $input) {
  |   ^
3 |     states {
I am on server
k

Kevin Kho

05/25/2022, 5:22 PM
Thanks for moving it. If I have to guess, your API pod is down or your DB is full, or something like that. If this is happening for all flows, I think this is likely to be the case. Does starting an agent lead to errors also?
w

will milner

05/25/2022, 5:22 PM
i'm able to start a new Docker agent, I can try starting another kubernetes agent
ah, I found the issue. There were too many pods stuck on my cluster which was not allowing more pods to be scheduled. After I removed the pods that were stuck, new ones are able to be started now
k

Kevin Kho

05/25/2022, 5:25 PM
Ah ok glad you figured it out