<@ULVA73B9P> I have a deployment with a schedule t...
# marvin-ai
m
@Marvin I have a deployment with a schedule thant suddenly started to have all run calcelled, I deleted it and redeploy it, the first scheuled run was ok then it cancelled all the subsequent one again, why?
m
thought for 304.5 seconds
Sorry that’s happening — let’s figure it out. In Prefect 3.x, scheduled runs getting auto-cancelled right after the first successful one is almost always due to orchestration rules or automations. The most common causes: - An Automation that cancels other runs - Example: “When a run for this deployment starts/completes, cancel all other Scheduled runs for the same deployment.” This perfectly matches “first run OK, subsequent ones cancelled.” - How to check: In the UI, open Automations and look for any with the CancelFlowRun action targeting this deployment. Also open any cancelled run and check its State details — it will usually say “Cancelled by automation …” - Docs: Automations - Late run policy (runs cancelled when they miss their start time) - If your schedule interval is shorter than the flow runtime, later occurrences can go “Late” and get cancelled depending on policy. - How to check: Open a cancelled run and expand the State history; look for a message about being late/overdue. - Docs: Schedules and Cancelling workflows - Concurrency limits + a cancellation rule - Global or tag-based concurrency limits alone usually put runs into AwaitingConcurrencySlot (not Cancelled). But if you also have an automation that cancels long-waiting runs, they’ll flip to Cancelled. - Docs: Global concurrency limits - External/manual cancellation (user, script, CI, etc.) - In Cloud Pro/Enterprise, you can confirm who/what cancelled in Audit Logs. - Docs: Audit logs Notes: - Prefect 3.x does not have a built-in “Only run latest” switch. If you had that behavior, it’s typically implemented via an Automation that cancels older scheduled runs. Next checks to pinpoint the cause: 1) Open one of the cancelled runs → State timeline → click “Cancelled” and copy the “state message” and “Set by” details. 2) Check the Automations page for any CancelFlowRun automations scoped to this deployment/work pool/tags. 3) Confirm if you’re using any global/tag concurrency limits and whether there’s an automation that cancels runs that wait or are late. 4) Compare your schedule interval to your flow’s runtime (short intervals + long runs can trigger late policies). If you can share: - The exact cancellation message from a cancelled run - Whether you’re on Prefect Cloud or OSS - Your deployment’s schedule config and any relevant automations/concurrency limits …I can tell you precisely what’s causing the cancellations and the best fix.
m
@Marvin we have a local cunccurency limit of 1 per deployment, the furst run is successful and after the next runs gets cancelled like there is still an ongoing run whike it is completed. after a delete and re deploy the issue persists. we have no automation in place. its really weird, it used to work perfectly, then suddendly this behavior.
m
thought for 444.0 seconds
Thanks for the report — that does sound frustrating. Let’s narrow this down. In Prefect 3.x, there are two common causes for “first run succeeds, subsequent runs get cancelled/blocked as if one is still active”: 1) Overlap/active-run limit at the deployment - If the deployment is configured to prevent overlapping runs (max active runs = 1) or to auto-cancel overlapping runs, Prefect will cancel new runs whenever it still considers another run active. - If a previous run is stuck in RUNNING or CANCELLING (e.g., worker died mid-run), the overlap guard will keep cancelling new runs. 2) A concurrency limit with a stuck slot - A global/task-level concurrency limit (by tag) can hold a “slot” that never released, making new runs wait or cancel depending on your settings. This can persist across deletes/re-deploys. A few quick checks you can run now - Confirm version and inspect deployment:
Copy code
prefect version
prefect deployment inspect "<flow_name>/<deployment_name>"
- Look for any “active” runs the system still sees:
Copy code
prefect flow-run ls --flow-name "<flow_name>" --state RUNNING --limit 50
prefect flow-run ls --flow-name "<flow_name>" --state CANCELLING --limit 50
If you find stale runs, try cancelling them:
Copy code
prefect flow-run cancel "<flow_run_id>"
- Check for any concurrency limits and stuck slots (even if you didn’t set one intentionally):
Copy code
prefect concurrency-limit ls
If you see a limit that could apply to this flow, inspect it (this will show active_slots):
Copy code
prefect concurrency-limit inspect "<tag>"
If you find a slot that looks stuck, reset it:
Copy code
prefect concurrency-limit reset "<tag>"
Where to look in the UI - Open the deployment in the UI and review its “run behavior”/overlap settings. If “prevent overlapping runs” or “cancel overlapping runs” is on, that would explain the cancellations while Prefect still thinks a prior run is active. - Check the Flow Runs page for any run that’s still RUNNING/CANCELLING and older than you expect. Helpful docs - Global and task-level concurrency limits: Docs: Global concurrency limits - CLI for concurrency limits (create/inspect/reset): CLI: concurrency-limit A few clarifying questions to help me pinpoint this: - Are you on Prefect Cloud or self-hosted Prefect Server? Which Prefect version (
prefect version
)? - Which worker type are you using (Local, Docker, Kubernetes) and its version? - Where exactly did you set “local concurrency limit of 1 per deployment”? In the deployment’s settings (prevent overlapping/max active runs), in a global concurrency limit/tag, or in worker/pool concurrency? - For one of the cancelled runs: what is the exact state message in the UI? Does it mention overlap/cancellation due to another active run? - Do you see any RUNNING or CANCELLING runs for this flow that are older than they should be? If you can share one affected deployment name and a couple of flow run IDs (successful first run and a cancelled subsequent run), I can suggest the next exact step (and the exact tag to reset if a concurrency limit is involved).
m
@Marvin ir looks lika than it my case, a completed run is takng the concurrency slot, (i have only locals one no global)
m
thought for 1355.9 seconds
Got it — if a completed run is still holding the slot, that’s exactly what would block the next run with a limit of 1. Concurrency limits persist on the server and aren’t tied to a specific deployment object, so deleting/re-deploying won’t clear a stuck slot — you need to inspect/reset the limit itself. Try this sequence: 1) List limits to find the tag you used
Copy code
prefect concurrency-limit ls
2) Inspect the limit to see which run is holding the slot
Copy code
prefect concurrency-limit inspect "<tag>"
- Look at
active_slots
— it will list TaskRun IDs holding the slot. If those runs are already COMPLETED, the slot is stuck. 3) Reset the stuck slots
Copy code
prefect concurrency-limit reset "<tag>"
4) Verify no stale runs are still considered active
Copy code
prefect flow-run ls --flow-name "<flow_name>" --state RUNNING --limit 50
prefect flow-run ls --flow-name "<flow_name>" --state CANCELLING --limit 50
Cancel any stragglers if needed:
Copy code
prefect flow-run cancel "<flow_run_id>"
Why this persists after redeploy - Concurrency limits are stored centrally and enforced by tag; they’re not cleared by deleting/recreating a deployment. Resetting the limit is the right fix when a slot gets stuck. Docs for reference: - Concurrency limits (including inspect/reset): Global concurrency limits - CLI reference: CLI: concurrency-limit If you share: - the tag name you’re using, - the output of
prefect concurrency-limit inspect "<tag>"
(especially
active_slots
), - one example flow run ID that is “completed but holding the slot”, …I can help pinpoint exactly which slot is stuck and confirm the best next step. Also, which Prefect version are you on (
prefect version
)? If you’re not on the latest 3.x, upgrading can help — there have been fixes around state transitions and slot release.