https://prefect.io logo
Title
t

Tim Galvin

02/16/2023, 6:30 AM
Hi all -- a fun one. I have noticed some of my flows failing with this error:
Task run '511abff8-faa3-4efc-94e6-f4be435db16e' received abort during orchestration: This run cannot transition to the RUNNING state from the RUNNING state. Task run is in RUNNING state
As far as I can tell it is inconsistent (work flow sometimes works, sometimes does not). I am running a
DaskTaskRunner
back by a
SLURMCluster
. The stage that is crashing is trying to read in a set of large-ish files, and I believe the GIL is not being released as the data is being accessed. At the moment there is a high load on the disk I/O, and my best guess is that the dask nanny is somehow failing a health check, and prefect in turn is causing some round-about error like this. Any ideas that make more sense?
👍 1
y

Yaron Levi

02/16/2023, 11:59 AM
Please look at this questions, and the reply I’ve made: