We have a flow that spawns 100s of mapped runs (we...
# prefect-community
j
We have a flow that spawns 100s of mapped runs (we have a chain of 6-7 mapped tasks) and we’re consistently seeing 1 or 2 get stuck in pending state downstream from a failure in Cloud: 1. Any ideas on how to address this? This behaviour started fairly recently 2. Can I get a hint on how to kill the pending task runs using GraphQL? I can find the tasks runs (flow status is Cancelled with tasks with state pending) but I’m not clear on how to turn that into a mutation.
k
What is your executor? Dask? Do you have retries on the one that fails? Does it get stuck after everything else has run? The mutation would be
set_task_run_state
to Failed once you get the ids of those tasks
j
Dask executor. The pseudo-flow structure is:
A->B->C->D->E
Where A pulls a list, and then B, C, D, E are all mapped over 100 elements or so. Usually the pattern we see is that Task B fails for 2 of the inputs, and then the downstream of those for tasks C, D and E, just stay in Pending doing nothing for days.
k
What is your Dask setup? Do you know if you are using processes or threads?
j
For this context we are using threads
k
Are you open to trying processes? It might be more stable
j
I don't see a reason why not.