< Marvin> I am having a lot of `Concurrency lease renewal fa Prefect Community #marvin-ai

<@ULVA73B9P> I am having a lot of `Concurrency lea...

Mehdi Lallouache

10/28/2025, 11:07 AM

@Marvin I am having a lot of

Concurrency lease renewal failed - slots are no longer reserved. Terminating execution to prevent over-allocation

lately. Why is so? Its happening on a flow served locally. I don't have global concurrency limit,s only limits per deployment.

Marvin

10/28/2025, 11:07 AM

✅ thought for 410.6 seconds

Marvin

10/28/2025, 11:14 AM

That message means the concurrency lease for your run expired and could not be renewed, so Prefect terminates the run to avoid exceeding your configured limit. It’s not tied to “global” vs “per‑deployment” limits—both use the same lease-based system. What typically causes it - The runner can’t reach the API for long enough that the lease TTL expires (local server restart, network hiccups, laptop sleep, proxy closing idle connections, etc.) - The API/server process restarts or is briefly unavailable (common when developing locally) - The runner process is blocked for too long (e.g., heavy CPU or blocking calls in an async flow) and the background renewal doesn’t run in time - Very slow API responses combined with short connection timeouts/retries on the client Why you’re seeing it with a locally served flow - If you’re using

flow.serve()

or a local Prefect Server, any restart, reload, or sleep of that process can drop/expire leases and the next renewal will be rejected as “slots are no longer reserved”. - Local development environments are more prone to blocking the event loop or pausing the process during hot reloads, debugging, or laptop sleep. Quick things to check/try - Confirm environment - Are you on Prefect Cloud or local Prefect Server? - Are you using

flow.serve()

or a worker-based deployment? - Prefect version (3.x) and whether this started after an update. - Look for process pauses - Any hot-reload/dev server restarts? Laptop sleep? Long blocking CPU work in an async flow? - Inspect your deployment concurrency config - Limit value, collision strategy (ENQUEUE vs SKIP), and slot_decay_per_second. - In the UI, watch the deployment’s active slots while runs execute. - Increase client resiliency to transient API slowness - Try higher API timeout and retries:

Copy code

PREFECT_API_REQUEST_TIMEOUT=120
    PREFECT_CLIENT_MAX_RETRIES=10

- Enable debug logs and capture a snippet around the error from both the runner and the API server:

Copy code

PREFECT_LOGGING_LEVEL=DEBUG

This can reveal whether renewals are timing out vs being rejected after server restart. - Keep the machine awake and avoid local server restarts during long runs if you must run locally. For stability, consider deploying and running via a worker instead of

flow.serve()

for long-lived processes. Notes and references - Prefect’s concurrency uses time-limited leases that must be renewed; if renewal fails and the TTL expires, the server releases the slots and the run is terminated to prevent over-allocation: - Concurrency overview - Lease renewal API - Settings you can tune for network robustness: Settings reference If you can share: - How you’re running (Cloud vs local server,

flow.serve()

vs worker) - Prefect version - Your deployment concurrency config (limit, collision strategy, slot_decay_per_second) - A short DEBUG log snippet around the error from both the runner and the API server …I can help pinpoint the exact cause and best fix.

Mehdi Lallouache

10/28/2025, 11:24 AM

@Marvin on prefect cloud, using serve, started to happend after update from 3.4.7 to 3.4.25

Marvin

10/28/2025, 11:24 AM

✅ thought for 46.9 seconds

Marvin

10/28/2025, 11:25 AM

Thanks — that context helps. Short answer: this is almost certainly the lease-renewal logic getting more sensitive after the upgrade. We’ve seen reports of “Concurrency lease renewal failed” in 3.4.15–3.4.25 when the renewal background task can’t reach the API or is interrupted, especially with long-lived local processes like serve. Relevant reports - Concurrency lease renewal failed on 3.4.20: #19068 - Infra not exiting cleanly after lease renewal failure (symptom of the same class of issue): #18839 - Server-side cap on lease duration (not your exact case, but related to lease behavior): #19218 Why this shows up with Cloud + serve after the upgrade - Serve is a long‑lived local process that renews the deployment’s concurrency lease periodically. Any pause in that process (hot reload, debugger breakpoints, laptop sleep, CPU spikes, blocking calls on the flow event loop) or transient API/network delays can cause the lease TTL to expire. In 3.4.2x the renewal path is stricter; if the lease has expired by the time the next renewal arrives, the server rejects it and the runtime cancels the run to avoid over‑allocation. - Cloud itself is fine; the pain point is the renewal timing from your local serve process to Cloud. Practical things to try right away 1) Upgrade to the latest 3.4.x patch - There have been lease/concurrency fixes across recent patches. If you can, bump to the latest 3.4.x and retest. 2) Make the client more resilient to transient slowness - Set higher timeouts/retries for API calls from your serve process:

Copy code

PREFECT_API_REQUEST_TIMEOUT=120
  PREFECT_CLIENT_MAX_RETRIES=10
  PREFECT_CLIENT_RETRY_EXTRA_CODES=502,503,504

- And run with:

Copy code

PREFECT_LOGGING_LEVEL=DEBUG

Then capture runner logs around the failure. 3) Ensure the event loop isn’t blocked - Avoid doing blocking work directly in the flow function; put heavy work in tasks. A blocking call in the flow body can prevent the renewal coroutine from running on time. 4) Sanity-check the concurrency configuration - Temporarily remove the per‑deployment concurrency limit and confirm the symptom disappears. If it does, we’ve isolated this to lease renewal rather than something else. - If you’re using slot decay on that limit, note the cadence and value. 5) Consider moving this workload off serve - Serve is great for dev, but for long‑running or production workloads, a worker‑based deployment is more resilient to these transient local issues. If you can share: - The exact Prefect version you’re on now (3.4.25) and whether upgrading changes behavior - A DEBUG log snippet from the serve process around the failure - The deployment’s concurrency settings (limit value, collision strategy, any slot_decay_per_second) - Whether the machine running serve might be sleeping or reloading …I can help pinpoint the root cause and either suggest a precise mitigation or escalate with a minimal repro. For background on the underlying lease system and settings you can tune, see: - Concurrency overview: https://docs-3.prefect.io/v3/how-to-guides/workflows/global-concurrency-limits - Lease renewal API: https://docs-3.prefect.io/v3/api-ref/rest-api/server/concurrency-limits-v2/renew-concurrency-lease - Client/API settings (timeouts, retries): https://docs-3.prefect.io/v3/api-ref/settings-ref

9 Views

Open in Slack

Previous Next