Thread
#prefect-community
    w

    will milner

    4 months ago
    Is there any reason why flow runs would suddenly not be able to be scheduled? I didn't make any updates at all to my server or agent and since Thursday I'm not able to run any flows. I'm using a kubernetes agent. I have no idea why this started happening or how to go about fixing this
    Kevin Kho

    Kevin Kho

    4 months ago
    Hi Will, could we move the tracebacks to the thread to keep the main channel cleaner? You are on Prefect Cloud right? (Not Server)
    w

    will milner

    4 months ago
    Inspecting the kubernetes agent I see
    ERROR:agent:Error attempting to set flow run state for 16ab885e-6b95-43cd-9214-38e13f18fde5: [{'message': 'State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5', 'locations': [{'line': 2, 'column': 5}], 'path': ['set_flow_run_states'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5'}}}]
    Inspecting the graphql logs I do see this
    State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5
    
    GraphQL request:2:3
    1 | mutation ($input: set_flow_run_states_input!) {
    2 |   set_flow_run_states(input: $input) {
      |   ^
    3 |     states {
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 628, in await_result
        return await result
      File "/prefect-server/src/prefect_server/graphql/extensions.py", line 52, in resolve
        result = await result
      File "/prefect-server/src/prefect_server/graphql/states.py", line 45, in resolve_set_flow_run_states
        *[check_size_and_set_state(state_input) for state_input in input["states"]]
      File "/prefect-server/src/prefect_server/graphql/states.py", line 39, in check_size_and_set_state
        agent_id=agent_id,
      File "/prefect-server/src/prefect_server/api/states.py", line 53, in set_flow_run_state
        raise ValueError(f"State update failed for flow run ID {flow_run_id}")
    ValueError: State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 674, in await_completed
        return await completed
      File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 659, in await_result
        return_type, field_nodes, info, path, await result
      File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 733, in complete_value
        raise result
      File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 628, in await_result
        return await result
      File "/prefect-server/src/prefect_server/graphql/extensions.py", line 52, in resolve
        result = await result
      File "/prefect-server/src/prefect_server/graphql/states.py", line 45, in resolve_set_flow_run_states
        *[check_size_and_set_state(state_input) for state_input in input["states"]]
      File "/prefect-server/src/prefect_server/graphql/states.py", line 39, in check_size_and_set_state
        agent_id=agent_id,
      File "/prefect-server/src/prefect_server/api/states.py", line 53, in set_flow_run_state
        raise ValueError(f"State update failed for flow run ID {flow_run_id}")
    graphql.error.graphql_error.GraphQLError: State update failed for flow run ID 16ab885e-6b95-43cd-9214-38e13f18fde5
    
    GraphQL request:2:3
    1 | mutation ($input: set_flow_run_states_input!) {
    2 |   set_flow_run_states(input: $input) {
      |   ^
    3 |     states {
    I am on server
    Kevin Kho

    Kevin Kho

    4 months ago
    Thanks for moving it. If I have to guess, your API pod is down or your DB is full, or something like that. If this is happening for all flows, I think this is likely to be the case. Does starting an agent lead to errors also?
    w

    will milner

    4 months ago
    i'm able to start a new Docker agent, I can try starting another kubernetes agent
    ah, I found the issue. There were too many pods stuck on my cluster which was not allowing more pods to be scheduled. After I removed the pods that were stuck, new ones are able to be started now
    Kevin Kho

    Kevin Kho

    4 months ago
    Ah ok glad you figured it out