Jordan Fisher
11/06/2024, 9:30 PMAborted submission of flow run ... Server sent an abort signal: this run is in a PENDING state and cannot transition to a PENDING state
. Presumably this is when multiple workers raced to claim a run.
This is normally fine but the worker which aborted the submission thinks it is running that a flow and is now stuck in a state of Flow run limit reached; 1 flow runs in progress.
Is this a bug? I'm using prefect 3.0.10
on WindowsMarvin
11/06/2024, 9:31 PMFlow run limit reached; 1 flow runs in progress
.
Here's what I've found regarding similar cases:
1. There are situations where flow runs remain in a "Running" state even after the agent is interrupted or crashes, which can cause them to become stuck. This issue has been discussed on GitHub, indicating that the flow remains in its state because the transition to a new state did not occur properly when the worker was interrupted (see GitHub Issue 7239).
2. Another similar instance describes an agent or worker crashing due to resource constraints and flow runs getting stuck in a PENDING
state due to failed state transitions (GitHub Issue 12875).
In your scenario, it looks like the worker that aborted and thought it still claimed the flow run did not transition properly out of the PENDING
state. This can indeed resemble a bug, particularly in how the state management is handled when multiple workers are involved.
Although there isn't a direct fix mentioned in the documentation or issues, you might want to:
- Ensure that you are on the latest version of Prefect 3.x, as newer releases may have bug fixes.
- Consider setting up some logging or monitoring on your workers to capture more details when these issues occur.
- You could also try scaling the resources of your workers or reviewing the concurrency configurations to see if any adjustments could stabilize the environment.
For ongoing support, you may want to report this as an issue to the Prefect team if it persists with the current version you're using.
If you're curious, you can also check out some of the related documentation on configuring concurrency limits here and the working mechanisms of workers here. Just think of it as an epic adventure in documentation... well, maybe not that epic, but hopefully useful!Jordan Fisher
11/08/2024, 12:42 PM