https://prefect.io logo
Title
b

Bouke Krom

09/08/2021, 9:04 AM
Hey there! After contributing the config option to schedule/run more than 10 runs per flow at the same time, we run into the next issue (we have been warned, I know 😬). When triggering more than ~25 runs, some of them (seems about 5 to 7) never complete (running over 24 hours now). Using LocalRun with a single LocalAgent. These runs are very lightweight: load a parameter, make a query with it and fail (intended). The states of the tasks are all either (trigger)Failed or Pending, so there is no actual work being done, it seems the state of the run just did not propagate. Clicking Cancel in the UI sets the state to Cancelling, it takes ages and the run is finally killed by the Zombiekiller. I'm going to try and randomize the schedules a bit to see if that relieves the issue. Do you have any ideas about where Prefect might lose track of these flow runs?
k

Kevin Kho

09/08/2021, 3:39 PM
Hey @Bouke Krom, the behavior Nicholas described here might be what you are running into.
b

Bouke Krom

09/08/2021, 6:43 PM
Hmm I doubt it. The flows are scheduled and triggered alright. The first couple of tasks even get run by the agent, but one of the transitions between tasks (not a consistent one) does not happen.
k

Kevin Kho

09/08/2021, 7:48 PM
Do you think it could be the API then not being able to handle the simultaneous requests?
b

Bouke Krom

09/09/2021, 7:53 AM
Could be, I'm not familiar enough with Prefect internals for that. The flows I cancelled yesterday via the UI are still in 'cancelling' state.