José Agretti
09/30/2025, 3:14 PMMarvin
09/30/2025, 3:14 PMMarvin
09/30/2025, 3:15 PMstart_time has passed but no worker has picked it up yet. This can be brief (e.g., worker poll/startup delay) or persistent.
- How it resolves: As soon as a worker picks it up, it will move to Pending/Running as normal.
- Impact on retries/cancellation: None by default. LATE doesn’t trigger retries and it isn’t auto-cancelled. In Prefect Cloud you can create an Automation on “Flow run late” to notify or cancel after some time.
- Common causes:
- No workers online or the work pool is paused
- Worker can’t match the work pool/queue or labels
- Concurrency limits holding runs
- Slow infrastructure start-up or backlog (e.g., cold starts on Vertex AI)
- Very tight schedules vs worker poll intervals
Docs: States in Prefect
If you’re seeing lots of LATE runs in your Vertex AI work pool, I can help you check:
- Work pool status (not paused) and concurrency limits
- Worker logs for claim attempts
- Poll interval and startup time of your Vertex AI base image
- That the deployment targets the correct work pool and queueJosé Agretti
10/01/2025, 12:13 PMAwaitingConcurrencySlot and some are in Late state? Besides the 50 runningMarvin
10/01/2025, 12:14 PMMarvin
10/01/2025, 12:20 PM