We've had all our production workers start boot-looping in the past hour claiming first that a flow run did not pass checks and will not be submitted for execution (deployment not found), followed by a RunTimeError "this borrower is already holding one of this CapacityLimiter's tokens"
I'm crossing my fingers someone has seen this CapacityLimit issue before?
n
Nate
01/20/2025, 3:30 PM
hmm are you using .serve someplace? specifically it has a limit kwarg which uses tokens for subprocess concurrency. i would guess there’s some zombie runs holding concurrency tokens someplace unexpected, but without more info its hard to say
s
Samuel Hinton
01/21/2025, 4:15 AM
@Nate I expanded on what was happening in this GitHub issue: https://github.com/PrefectHQ/prefect/issues/10632
But to clarify here, not using serve, and it seems that some issue with renaming a deployment not cascading to rename already scheduled flow runs causes the worker to not release its concurrency token + crash.
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.