Hey there! After contributing the config option to schedule/run more than 10 runs per flow at the same time, we run into the next issue (we have been warned, I know 😬). When triggering more than ~25 runs, some of them (seems about 5 to 7) never complete (running over 24 hours now). Using LocalRun with a single LocalAgent.
These runs are very lightweight: load a parameter, make a query with it and fail (intended). The states of the tasks are all either (trigger)Failed or Pending, so there is no actual work being done, it seems the state of the run just did not propagate. Clicking Cancel in the UI sets the state to Cancelling, it takes ages and the run is finally killed by the Zombiekiller.
I'm going to try and randomize the schedules a bit to see if that relieves the issue. Do you have any ideas about where Prefect might lose track of these flow runs?
k
Kevin Kho
09/08/2021, 3:39 PM
Hey @Bouke Krom, the behavior Nicholas described here might be what you are running into.
b
Bouke Krom
09/08/2021, 6:43 PM
Hmm I doubt it. The flows are scheduled and triggered alright. The first couple of tasks even get run by the agent, but one of the transitions between tasks (not a consistent one) does not happen.
k
Kevin Kho
09/08/2021, 7:48 PM
Do you think it could be the API then not being able to handle the simultaneous requests?
b
Bouke Krom
09/09/2021, 7:53 AM
Could be, I'm not familiar enough with Prefect internals for that. The flows I cancelled yesterday via the UI are still in 'cancelling' state.
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.