https://prefect.io logo
Title
m

Matthew Millendorf

06/16/2022, 3:51 PM
Hello, I’ve got a bit of an issue that would love some feedback on. There was an initial bug reported in our application where the graphql query to Prefect for a Flow, given the project and flow name, was returning None. Using the Interactive API, I found that the project and flow name for that given Flow existed and got a return. This code has not been touched in months and just recently started failing. Diving further, I noticed our ECS agents have not queried for flows to execute in 3 days, which is when the bug was reported. I restarted our Fargate Services where our ECS agents live, and the agents are still not querying for flows to execute, despite successfully running and ‘waiting for flow runs’. Any idea what’s going on or how to proceed with debugging further?
k

Kevin Kho

06/16/2022, 5:02 PM
Yeah do you by chance have more than 725 late runs? The scheduler will freeze at that number
m

Matthew Millendorf

06/16/2022, 6:28 PM
im only seeing around 20 in the Agents view in the UI.
k

Kevin Kho

06/16/2022, 6:29 PM
It should be displayed in the main dashboard as one of the cards below
m

Matthew Millendorf

06/16/2022, 6:31 PM
i only have 17 late runs in that dashboard view.
ill clear them?
k

Kevin Kho

06/16/2022, 6:32 PM
Nah that shouldn’t cause an issue
m

Matthew Millendorf

06/16/2022, 6:32 PM
ok interesting.
do you need more info, or screencaps from my AWS ECS console, Prefect UI ?
k

Kevin Kho

06/16/2022, 6:35 PM
Yeah I guess I’m curious if the behavior is on 1 flow or multiple. You can also try doing something like:
prefect run --name flow_name --project project-name --watch
to see if an agent picks it up? There will be a warning raised if there are no agents with matching labels
m

Matthew Millendorf

06/16/2022, 6:36 PM
multiple flows launching from a web hook. ill give this a shot.
k

Kevin Kho

06/16/2022, 6:42 PM
Could you verify the labels are matching too?
m

Matthew Millendorf

06/16/2022, 6:43 PM
what do you think? this is the error but at the same time, i dont see the agent executing… kinda confused.
Watching flow run execution...
└── 14:40:51 | INFO    | Entered state <Scheduled>: Flow run scheduled.
└── 14:40:54 | INFO    | Entered state <Submitted>: Submitted for execution
└── 14:41:22 | INFO    | Downloading flow from <s3://raptormaps-prefect-flows/leo-qa/2022-05-31t21-13-30-615824-00-00>
└── 14:41:24 | INFO    | Flow successfully downloaded. ETag: "a406a2539dad5330f88c0e40994a5e55", LastModified: 2022-05-31T21:13:34+00:00, VersionId: None
└── 14:41:24 | INFO    | Entered state <Failed>: Failed to load and execute Flow's environment: FlowStorageError('An error occurred while unpickling the flow:\n  TypeError("\'NoneType\' object is not callable")\nThis may be due to one of the following version mismatches between the flow build and execution environments:\n  - cloudpickle: (flow built with \'1.6.0\', currently running with \'2.0.0\')\n  - python: (flow built with \'3.6.8\', currently running with \'3.7.12\')')
Flow run failed!
k

Kevin Kho

06/16/2022, 7:01 PM
This error should be separate from the agent picking stuff up, but we can fix this one first
Could you register with 3.7 and cloudpickle 2.0?
m

Matthew Millendorf

06/16/2022, 7:03 PM
okay will do.
k

Kevin Kho

06/16/2022, 7:03 PM
Is this another agent that picked it up?
m

Matthew Millendorf

06/16/2022, 7:03 PM
our entire repo is on 3.6… could this be an issue. barring we dont separate out dependencies.
i dont see what agent couldve picked this up, which is wird
k

Kevin Kho

06/16/2022, 7:03 PM
Ah no. then you should specify the 3.6 image instead
ECSRun(image="prefecthq/prefect:latest-python3.6")
something like that. You can view which one picked it up by hunting for it in the Agents tab