Hi all we have two process work pools in Prefect Cloud to ru Prefect Community #prefect-cloud

Hi all, we have two process work pools in Prefect ...

Jakub Roman

08/05/2024, 2:22 PM

Hi all, we have two process work pools in Prefect Cloud to run timing-sensitive flows. They stopped executing new flows yesterday around 01:00 UTC. I somehow managed to unblock one of them yesterday, but the second one is still stuck. I can see thousands of late flow runs in the pool. The pool has status Ready. Same for the queue. And the worker appears as Online. I tried restarting workers, moving flows to on-demand ECS push pools, and increasing CPU/memory resources, but no luck. I also tried to replicate this locally by running:

Copy code

PREFECT_LOGGING_LEVEL=DEBUG prefect worker start --pool $POOL_NAME --work-queue default --type process --with-healthcheck --install-policy never --prefetch-seconds 30

which does not discover any flow runs:

Copy code

Worker 'ProcessWorker 2f013cba-dfd4-4492-8887-dd8205a1da23' started!
13:42:11.306 | DEBUG   | prefect.worker.process.processworker 2f013cba-dfd4-4492-8887-dd8205a1da23 - Worker synchronized with the Prefect API server.
13:42:16.310 | DEBUG   | prefect.utilities.services.critical_service_loop - Starting run of 'get_and_submit_flow_runs'
13:42:16.311 | DEBUG   | prefect.worker.process.processworker 2f013cba-dfd4-4492-8887-dd8205a1da23 - Querying for flow runs scheduled before 2024-08-05T13:42:46.311643+00:00
13:42:16.313 | DEBUG   | prefect.utilities.services.critical_service_loop - Starting run of 'sync_with_backend'
13:42:16.314 | DEBUG   | prefect.utilities.services.critical_service_loop - Starting run of 'check_for_cancelled_flow_runs'
13:42:16.314 | DEBUG   | prefect.worker.process.processworker 2f013cba-dfd4-4492-8887-dd8205a1da23 - Checking for cancelled flow runs...
13:42:16.464 | DEBUG   | prefect.worker.process.processworker 2f013cba-dfd4-4492-8887-dd8205a1da23 - Discovered 0 scheduled_flow_runs
13:42:16.705 | DEBUG   | prefect.worker.process.processworker 2f013cba-dfd4-4492-8887-dd8205a1da23 - Worker synchronized with the Prefect API server.
13:42:25.569 | DEBUG   | prefect.utilities.services.critical_service_loop - Starting run of 'get_and_submit_flow_runs'
13:42:25.570 | DEBUG   | prefect.worker.process.processworker 2f013cba-dfd4-4492-8887-dd8205a1da23 - Querying for flow runs scheduled before 2024-08-05T13:42:55.570226+00:00
13:42:25.723 | DEBUG   | prefect.worker.process.processworker 2f013cba-dfd4-4492-8887-dd8205a1da23 - Discovered 0 scheduled_flow_runs
13:42:37.262 | DEBUG   | prefect.utilities.services.critical_service_loop - Starting run of 'get_and_submit_flow_runs'
13:42:37.263 | DEBUG   | prefect.worker.process.processworker 2f013cba-dfd4-4492-8887-dd8205a1da23 - Querying for flow runs scheduled before 2024-08-05T13:43:07.263616+00:00
13:42:37.419 | DEBUG   | prefect.worker.process.processworker 2f013cba-dfd4-4492-8887-dd8205a1da23 - Discovered 0 scheduled_flow_runs

Any idea what might be going on here?

Jakub Roman

08/05/2024, 2:30 PM

workers are using

prefect==2.19.8

Jake Kaplan

08/05/2024, 2:43 PM

hey, it looks like you have a concurrency limit of 50 on that work pool. Both pending and running flow count as in-progress runs and take up concurrency slots. If all the concurrency slots are consumed, new runs won't be released to the worker. Can you check if you have any in progress runs and potentially remove them?

Jakub Roman

08/05/2024, 5:51 PM

no running flows, at least not in the past 30 days

Jakub Roman

08/05/2024, 5:55 PM

actually, i've tried removing the concurrency limit and I suddenly see this in our logs

Jakub Roman

08/05/2024, 5:58 PM

i suppose there were old flow runs stuck in the running state is it possible to clean our database? I don't see a way to way to yank them from the UI as we're limited to the 30d interval

Jake Kaplan

08/05/2024, 9:40 PM

I'm glad things are running at least, let me take a look and see if I can find any old PENDING/RUNNING runs that might explain this

Jakub Roman

08/06/2024, 12:37 PM

thanks for looking into that. let me know if you need anything from my end

13 Views

Open in Slack

Previous Next