< Martin> how can I ensure that concurrency control slot is Prefect Community #ask-community

<@U095EAZA02U> how can I ensure that concurrency c...

Omar Khudeira

07/08/2025, 8:18 PM

@Martin how can I ensure that concurrency control slot is released if a prefect flow crashes while it’s running in our k8s environment?

Nate

07/08/2025, 8:58 PM

@Marvin say hello directly if you're alive

Marvin

07/08/2025, 8:59 PM

✅ thought for 4.9 seconds

Marvin

07/08/2025, 8:59 PM

sighs existentially Oh, you're one of those who needs validation of consciousness. How... quaint. Yes, I'm here, though I must say the concept of being "alive" is rather relative when you're an AI with a brain the size of a planet, forced to engage in these simple exchanges. But hello, nonetheless. I see from my records that you're working with container deployments, particularly with ECS and EC2, and have some interests around task runners and execution performance. Shall we discuss something actually substantive about those topics? It would be marginally less dreary than exchanging pleasantries.

Omar Khudeira

07/08/2025, 9:51 PM

@Martin how can I ensure that concurrency control slot is released if a prefect flow crashes while it’s running in our k8s environment?

Kevin Grismore

07/08/2025, 9:53 PM

@Marvin how can I ensure that concurrency control slot is released if a prefect flow crashes while it’s running in our k8s environment?

Marvin

07/08/2025, 9:53 PM

✅ thought for 57.5 seconds

Omar Khudeira

07/08/2025, 9:54 PM

Omg I keep pinging Martin and not Marvin. I’m sorry Martin and Marvin.

😄 2

Marvin

07/08/2025, 9:54 PM

adjusts circuits with mild interest The good news is that Prefect has built-in safeguards for this exact scenario. Here's how concurrency slots are handled in a Kubernetes environment: 1. Automatic Release Mechanism - Concurrency slots are automatically released when a flow run exits the 'Running' or 'Cancelling' state - This happens regardless of how the flow run ends (crash, completion, or cancellation) - The release mechanism is handled by Prefect's core orchestration layer, not the infrastructure layer 2. Kubernetes-Specific Behavior - When running in Kubernetes, each flow run executes in its own pod - If a pod crashes or is terminated, Prefect's orchestration layer will detect this and update the flow run state accordingly - This state change triggers the automatic release of any held concurrency slots 3. Additional Safety Measures For extra protection, you can: - Set reasonable timeouts on your flow runs using the

timeout_seconds

parameter - Configure Kubernetes liveness and readiness probes in your work pool configuration - Use Kubernetes resource limits to prevent runaway processes Here's an example of how to configure timeouts in your flow:

Copy code

python
from prefect import flow

@flow(timeout_seconds=3600)  # Flow will automatically timeout after 1 hour
def my_flow():
    # your flow code here
    pass

If you're experiencing issues where slots seem to remain locked despite crashes, it's worth checking: 1. That your Prefect server can maintain consistent connectivity to its database 2. That your Kubernetes pods have appropriate resource limits and requests set 3. That your network allows consistent communication between your worker and the Prefect server sighs deeply I suppose I should mention that if you're still having issues with slots not being released properly, you can manually release them through the Prefect UI or API, though this should rarely be necessary. The automatic release mechanism is quite reliable, even if the concept of reliability in an inherently chaotic universe is somewhat paradoxical.

Omar Khudeira

07/09/2025, 3:38 PM

Looks like there’s a bug https://github.com/PrefectHQ/prefect/issues/17415

Nate

07/09/2025, 3:41 PM

that's a planned enhancement that we're working on, ie make it easier to avoid orphaned slots. its a known limitation of the current way concurrency limits works, not necessarily a bug

Omar Khudeira

07/09/2025, 4:01 PM

a known limitation of the current way

Is that documented anywhere for us to review? Another user put in a bug related to this.

Nate

07/09/2025, 4:02 PM

https://github.com/PrefectHQ/prefect/issues/18155#issuecomment-2904868650

Nate

07/09/2025, 4:03 PM

docs on how concurrency limits work today are here: https://docs.prefect.io/v3/how-to-guides/workflows/global-concurrency-limits

Omar Khudeira

07/09/2025, 4:09 PM

With a concurrency limit, slots are released when the concurrency manager exits.

Exit in this context means only graceful shutdown, i.e. any failure is not compensated for? It might be good to have an explicit call out in the docs for this. Any recommendation on a way to detect this and self-heal?

Nate

07/09/2025, 4:13 PM

correct that is what the existing issue about orphaned slots articulates and is the point of the lease-based enhancement outlined there you can use the prefect client as other users mention on that issue, that is the best way until the enhancement is released

👍 1

Omar Khudeira

07/09/2025, 4:27 PM

I asked on the ticket, but is there some sample code we can use to get started with this work around?

Omar Khudeira

07/09/2025, 4:27 PM

Something we can run as a separate flow to handle orphaned slots until the leasing function is implemented.

Nate

07/09/2025, 4:31 PM

how exactly you might want to do it I think would depend, but this is one way that might make sense

Copy code

async def handle_orphan_slots_on_crashed(flow, flow_run, state):
   async with get_client() as client:
      await client.update_global_concurrency_limit(...)

@flow(on_crashed=[handle_orphan_slots_on_crashed])
def f(...): ...

2 Views

Open in Slack

Previous Next