Hi all -- a fun one. I have noticed some of my flo...
# ask-community
t
Hi all -- a fun one. I have noticed some of my flows failing with this error:
Copy code
Task run '511abff8-faa3-4efc-94e6-f4be435db16e' received abort during orchestration: This run cannot transition to the RUNNING state from the RUNNING state. Task run is in RUNNING state
As far as I can tell it is inconsistent (work flow sometimes works, sometimes does not). I am running a
DaskTaskRunner
back by a
SLURMCluster
. The stage that is crashing is trying to read in a set of large-ish files, and I believe the GIL is not being released as the data is being accessed. At the moment there is a high load on the disk I/O, and my best guess is that the dask nanny is somehow failing a health check, and prefect in turn is causing some round-about error like this. Any ideas that make more sense?
👍 1
y
Please look at this questions, and the reply I’ve made: