We have a flow that spawns 100s of mapped runs (we...
# prefect-community
We have a flow that spawns 100s of mapped runs (we have a chain of 6-7 mapped tasks) and we’re consistently seeing 1 or 2 get stuck in pending state downstream from a failure in Cloud: 1. Any ideas on how to address this? This behaviour started fairly recently 2. Can I get a hint on how to kill the pending task runs using GraphQL? I can find the tasks runs (flow status is Cancelled with tasks with state pending) but I’m not clear on how to turn that into a mutation.
What is your executor? Dask? Do you have retries on the one that fails? Does it get stuck after everything else has run? The mutation would be
to Failed once you get the ids of those tasks
Dask executor. The pseudo-flow structure is:
Where A pulls a list, and then B, C, D, E are all mapped over 100 elements or so. Usually the pattern we see is that Task B fails for 2 of the inputs, and then the downstream of those for tasks C, D and E, just stay in Pending doing nothing for days.
What is your Dask setup? Do you know if you are using processes or threads?
For this context we are using threads
Are you open to trying processes? It might be more stable
I don't see a reason why not.