Hello, I’ve got a bit of an issue that would love...
# prefect-community
m
Hello, I’ve got a bit of an issue that would love some feedback on. There was an initial bug reported in our application where the graphql query to Prefect for a Flow, given the project and flow name, was returning None. Using the Interactive API, I found that the project and flow name for that given Flow existed and got a return. This code has not been touched in months and just recently started failing. Diving further, I noticed our ECS agents have not queried for flows to execute in 3 days, which is when the bug was reported. I restarted our Fargate Services where our ECS agents live, and the agents are still not querying for flows to execute, despite successfully running and ‘waiting for flow runs’. Any idea what’s going on or how to proceed with debugging further?
k
Yeah do you by chance have more than 725 late runs? The scheduler will freeze at that number
m
im only seeing around 20 in the Agents view in the UI.
k
It should be displayed in the main dashboard as one of the cards below
m
i only have 17 late runs in that dashboard view.
ill clear them?
k
Nah that shouldn’t cause an issue
m
ok interesting.
do you need more info, or screencaps from my AWS ECS console, Prefect UI ?
k
Yeah I guess I’m curious if the behavior is on 1 flow or multiple. You can also try doing something like:
Copy code
prefect run --name flow_name --project project-name --watch
to see if an agent picks it up? There will be a warning raised if there are no agents with matching labels
m
multiple flows launching from a web hook. ill give this a shot.
k
Could you verify the labels are matching too?
m
what do you think? this is the error but at the same time, i dont see the agent executing… kinda confused.
Copy code
Watching flow run execution...
└── 14:40:51 | INFO    | Entered state <Scheduled>: Flow run scheduled.
└── 14:40:54 | INFO    | Entered state <Submitted>: Submitted for execution
└── 14:41:22 | INFO    | Downloading flow from <s3://raptormaps-prefect-flows/leo-qa/2022-05-31t21-13-30-615824-00-00>
└── 14:41:24 | INFO    | Flow successfully downloaded. ETag: "a406a2539dad5330f88c0e40994a5e55", LastModified: 2022-05-31T21:13:34+00:00, VersionId: None
└── 14:41:24 | INFO    | Entered state <Failed>: Failed to load and execute Flow's environment: FlowStorageError('An error occurred while unpickling the flow:\n  TypeError("\'NoneType\' object is not callable")\nThis may be due to one of the following version mismatches between the flow build and execution environments:\n  - cloudpickle: (flow built with \'1.6.0\', currently running with \'2.0.0\')\n  - python: (flow built with \'3.6.8\', currently running with \'3.7.12\')')
Flow run failed!
k
This error should be separate from the agent picking stuff up, but we can fix this one first
Could you register with 3.7 and cloudpickle 2.0?
m
okay will do.
k
Is this another agent that picked it up?
m
our entire repo is on 3.6… could this be an issue. barring we dont separate out dependencies.
i dont see what agent couldve picked this up, which is wird
k
Ah no. then you should specify the 3.6 image instead
Copy code
ECSRun(image="prefecthq/prefect:latest-python3.6")
something like that. You can view which one picked it up by hunting for it in the Agents tab