José Agretti
10/01/2025, 7:15 PMException in callback amaintain_concurrency_lease.<locals>.handle_lease_renewal_failure(<Task cancell...leases.py:17>>) at /app/.venv/lib/python3.12/site-packages/prefect/concurrency/_leases.py:109
handle: <Handle amaintain_concurrency_lease.<locals>.handle_lease_renewal_failure(<Task cancell...leases.py:17>>) at /app/.venv/lib/python3.12/site-packages/prefect/concurrency/_leases.py:109>
What could be the cause?Marvin
10/01/2025, 7:15 PMNate
10/01/2025, 7:15 PMMarvin
10/01/2025, 7:17 PMfuture.exception() on a cancelled future raises CancelledError, which produces the callback error you’re seeing.
- There’s an open report that matches your trace and version pattern: Future.CancelledError when flow finished.
When to worry:
- If you also see messages like “Concurrency lease renewal failed - slots are no longer reserved…”, that indicates an actual renewal failure (e.g., transient API/network issue) during the run. In that case, concurrency limits may be exceeded for the remainder of the run. Concurrency API docs:
- Renew concurrency lease (Server API)
- Concurrency limits v2 Python client
What you can do:
- Make sure you’re on the latest 3.x (there have been concurrency/logging improvements; even if this specific noise isn’t called out, staying current helps). Full changelog: 3.4.19 → 3.4.20.
- If it’s just noisy logs, you can suppress them:
- In code:
import logging
logging.getLogger("prefect._internal.concurrency").setLevel(logging.CRITICAL)
- Or via a logging config file and PREFECT_LOGGING_SETTINGS.
- If you suspect real renewal failures (not just end-of-run noise), check for those “lease renewal failed” warnings around the same timestamps and verify worker connectivity to the API (especially relevant if Vertex AI workers are scaling down or preempting pods).
Could you share:
- Your Prefect version and Python version (looks like 3.12 from the path)?
- Whether runs are actually failing or if it’s only this log at shutdown?
This will help confirm if it’s the benign cancellation case or something we should dig into (network/API, clock skew, etc.).José Agretti
10/01/2025, 7:19 PMJosé Agretti
10/01/2025, 7:22 PMConcurrency lease renewal failed - slots are no longer reserved. Execution will continue, but concurrency limits may be exceeded. in the flow logs as wellMarvin
10/01/2025, 7:22 PMMarvin
10/01/2025, 7:26 PMstrict=True in the concurrency context. That way, if a renew fails, the run will terminate to prevent over-allocation.
- Async:
from prefect import flow
from prefect.concurrency.asyncio import concurrency
@flow
async def my_flow():
async with concurrency(names="my-limit", strict=True):
...
- Sync:
from prefect import flow
from prefect.concurrency.sync import concurrency
@flow
def my_flow():
with concurrency(names="my-limit", strict=True):
...
5) Look for the root exception
- Just before the “lease renewal failed…” log, there’s usually a more specific error (e.g., httpx timeout or 40x/50x). That will pinpoint whether you’re hitting auth, network, or server-side issues.
- You can temporarily raise verbosity for the concurrency logger to capture more detail:
import logging
logging.getLogger("prefect.concurrency").setLevel(logging.DEBUG)
logging.getLogger("prefect._internal.concurrency").setLevel(logging.DEBUG)
If you can share:
- Your Prefect and Python versions, and whether Vertex AI jobs are preemptible
- A snippet of the logs around the first “lease renewal failed” message (including any httpx/HTTP errors)
I can help narrow this down further and suggest a targeted fix.José Agretti
10/01/2025, 7:30 PMFile "/app/.venv/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 215, in _invoke_callbacks
callback(self)
File "/app/.venv/lib/python3.12/site-packages/prefect/concurrency/_leases.py", line 64, in handle_lease_renewal_failure
exc = future.exception()
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 483, in exception
raise CancelledError()
not sure if it's exactly the sameJosé Agretti
10/01/2025, 7:31 PMJosé Agretti
10/01/2025, 7:33 PM16:08:01.748 | ERROR | MainThread | prefect._internal.concurrency - exception calling callback for <Future at 0x7f1976ebec30 state=cancelled>"
And I upgraded Prefect because I saw this issue https://github.com/PrefectHQ/prefect/issues/18627José Agretti
10/03/2025, 3:17 PM3.4.20, and now I got this error (only 1 for now)
flow_name: "flow"
flow_run_id: "flow_id"
flow_run_name: "flow name"
funcName: "handle_lease_renewal_failure"
lineno: 123
logger: "prefect.flow_runs"
message: "Concurrency lease renewal failed - slots are no longer reserved. Terminating execution to prevent over-allocation."
module: "_leases"
timestamp: "2025-10-03 15:06:43,292"
And the previous log is this one
funcName: "_send_single_request"
lineno: 1740
logger: "httpx"
message: "HTTP Request: POST <https://prefect-server-986334413485.us-central1.run.app/api/v2/concurrency_limits/leases/bb237253-830c-4bde-9b03-a1b62899856f/renew> "HTTP/1.1 404 Not Found""
module: "_client"
taskName: "Task-21"
timestamp: "2025-10-03 15:06:43,291"
The Prefect server is running in a Cloud Run serviceJosé Agretti
10/03/2025, 3:18 PMJosé Agretti
10/03/2025, 3:25 PM3.4.20, and now I got this error (only 1 for now)
flow_name: "flow"
flow_run_id: "flow_id"
flow_run_name: "flow name"
funcName: "handle_lease_renewal_failure"
lineno: 123
logger: "prefect.flow_runs"
message: "Concurrency lease renewal failed - slots are no longer reserved. Terminating execution to prevent over-allocation."
module: "_leases"
timestamp: "2025-10-03 15:06:43,292"
And the previous log is this one
funcName: "_send_single_request"
lineno: 1740
logger: "httpx"
message: "HTTP Request: POST <https://prefect-server-986334413485.us-central1.run.app/api/v2/concurrency_limits/leases/bb237253-830c-4bde-9b03-a1b62899856f/renew> "HTTP/1.1 404 Not Found""
module: "_client"
taskName: "Task-21"
timestamp: "2025-10-03 15:06:43,291"
The Prefect server is running in a Cloud Run service. I disabled a global concurrency limit to see if we can get the flows running correctly for nowMarvin
10/03/2025, 3:25 PMMarvin
10/03/2025, 3:28 PMprefect concurrency-limit ls
prefect concurrency-limit inspect <your-tag>
- If you didn’t change the limit and still see this, check the server logs around that time for lease expiry/cleanup and network blips from your Vertex AI workers to the Cloud Run URL. Short-lived CPU throttling or pauses can cause missed renewals.
- If you see flows that finish successfully but get marked crashed due to this renewal race, that matches the linked issue. Upgrading to the latest 3.4.x may help; if it persists, we can dig in with logs.
References
- REST endpoint the client is hitting: Renew concurrency lease
- Related bug report: Completed Flow runs intermittently marked as Crashed on lease renewal failure
If you can share:
- Your Server version and image tag deployed to Cloud Run
- Whether the limit was reset/deleted around 150643
- Whether the run had mostly completed or truly failed mid-execution
…I can help confirm root cause and the best mitigation.José Agretti
10/03/2025, 3:39 PMMarvin
10/03/2025, 3:39 PMMarvin
10/03/2025, 3:44 PMprefect concurrency-limit ls
prefect concurrency-limit inspect <tag>
prefect concurrency-limit reset <tag> # clears stuck/ghost leases
- Global limits (since you disabled them, just verify)
prefect global-concurrency-limit ls
- Work pool concurrency (often overlooked)
prefect work-pool inspect <pool-name>
prefect work-pool clear-concurrency-limit <pool-name>
Force a specific flow run to start
- Programmatically, you can override orchestration and move a run to Running:
from prefect.client.orchestration import get_client
from prefect.states import Running
import anyio
async def force_start(flow_run_id: str):
async with get_client() as client:
await client.set_flow_run_state(flow_run_id=flow_run_id, state=Running(), force=True)
anyio.run(force_start, "<FLOW_RUN_ID>")
This is a blunt tool: it skips concurrency checks for that run.
If they’re still stuck in AwaitingConcurrencySlot after you’ve disabled limits
- There may be lingering leases that didn’t get released cleanly. Use prefect concurrency-limit reset <tag> for any relevant tags; that immediately frees all slots.
- Optionally nudge the run by re-scheduling it:
from datetime import datetime, timezone
from prefect.client.orchestration import get_client
from prefect.states import Scheduled
import anyio
async def reschedule(flow_run_id: str):
async with get_client() as client:
state = Scheduled(scheduled_time=datetime.now(timezone.utc))
await client.set_flow_run_state(flow_run_id=flow_run_id, state=state, force=True)
anyio.run(reschedule, "<FLOW_RUN_ID>")
The server will re-evaluate concurrency on the next transition; with limits cleared, it should proceed to Running.
If you share:
- the tag(s) applied to the runs,
- your work pool name (Vertex AI) and whether it has a pool-level concurrency limit,
- and a sample flow_run_id,
…I can suggest the exact inspect/reset commands to clear the blockage.José Agretti
10/03/2025, 4:03 PMMarvin
10/03/2025, 4:03 PMMarvin
10/03/2025, 4:05 PMprefect concurrency-limit reset command only affects tag-based limits, not deployment-level limits.
- Deployment-level limits are surfaced as global concurrency limits; you manage them via the Global Concurrency Limits UI/CLI or by editing the deployment.
- When you set the deployment to Unlimited, you removed that managed limit—which is why your runs started.
How to manage these cleanly
- If you want per-deployment control: set concurrency_limit on the deployment and avoid overlapping tag/global limits for the same workloads.
- If you prefer workspace-wide/global control: set the deployment to Unlimited and manage capacity solely with global or tag limits.
Useful commands
- List global limits (you should see the deployment-managed entry):
prefect global-concurrency-limit ls
- Disable or change the managed entry directly:
prefect global-concurrency-limit disable <name-or-id>
prefect global-concurrency-limit update <name-or-id> --limit 5
- Verify no tag-based limits are also gating your runs:
prefect concurrency-limit ls
Tip for stuck runs in AwaitingConcurrencySlot
- After you update/disable the relevant limit, the server should admit runs automatically. If some remain stuck, you can nudge them by rescheduling or forcing Running as we discussed earlier.
If you share the deployment name (and the Global Concurrency Limits entries you see), I can point you to the exact entry to disable/update so you don’t need to keep the deployment at Unlimited unless that’s your goal.José Agretti
10/03/2025, 4:09 PMMarvin
10/03/2025, 4:09 PMMarvin
10/03/2025, 4:10 PMconcurrency_limit set, that limit alone can keep runs in AwaitingConcurrencySlot even if:
- All tag-based limits are reset (prefect concurrency-limit …)
- Other Global Concurrency Limits are disabled
Why your runs only started when you set the deployment to Unlimited
- Those runs were blocked by the deployment’s own concurrency limit (a deployment-scoped resource). Resetting tag limits and disabling unrelated global limits won’t touch it.
- In 3.x, deployment concurrency is enforced independently; it’s implemented using the same “global resource” mechanism under the hood, but it’s tied specifically to that deployment. If you don’t disable/update that specific deployment-managed entry (or set the deployment to Unlimited), the block remains.
How to verify and manage the right thing
- Inspect the deployment to see its configured concurrency:
prefect deployment inspect "<flow_name>/<deployment_name>"
If concurrency_limit is present, it’s active and will gate that deployment’s runs.
- Manage the actual entry that enforces it:
- In the UI, open the deployment and adjust Concurrency Limit (set a higher number or Unlimited).
- Or redeploy with the desired setting:
- Keep it: set a numeric concurrency_limit
- Remove it: set concurrency_limit=None (Unlimited)
- Be aware that:
- prefect concurrency-limit * controls tag-based task limits only (won’t unblock deployment-level flow gating).
- prefect global-concurrency-limit * controls workspace-level resource locks; you’d need to identify the specific entry tied to your deployment if you want to manage it there. Disabling other global entries won’t affect a deployment’s own limit.
- Work pool concurrency can also gate starts:
prefect work-pool inspect <pool-name>
prefect work-pool clear-concurrency-limit <pool-name>
If you’d like, share the deployment name and I can help pinpoint the exact managed entry that corresponds to it so you can adjust it without setting the deployment to Unlimited. For reference on global limits, see Global concurrency limits.