Hey there! After contributing the config option to schedule/run more than 10 runs per flow at the same time, we run into the next issue (we have been warned, I know 😬). When triggering more than ~25 runs, some of them (seems about 5 to 7) never complete (running over 24 hours now). Using LocalRun with a single LocalAgent.
These runs are very lightweight: load a parameter, make a query with it and fail (intended). The states of the tasks are all either (trigger)Failed or Pending, so there is no actual work being done, it seems the state of the run just did not propagate. Clicking Cancel in the UI sets the state to Cancelling, it takes ages and the run is finally killed by the Zombiekiller.
I'm going to try and randomize the schedules a bit to see if that relieves the issue. Do you have any ideas about where Prefect might lose track of these flow runs?