Running into an issue where some mapped tasks fail with state TimedOut without an exception or apparent failures in the task run logs. Are there specific conditions that would cause a task to end in TimedOut state? (Feel free to point me to docs or code if needed.) Example log output in thread reply:
Joe Schmid
08/19/2021, 8:32 PM
Copy code
16:21:50
INFO
CloudTaskRunner
Task 'train_test_track[15]': Starting task run...
16:21:50
INFO
CloudTaskRunner
FeatureSpace metadata saved to: /home/efs/featurespace/data/XXXXXXXX/metadata/XXXXXX
FeatureSet data within batch folder: XXXXXXXX
16:21:52
INFO
CloudTaskRunner
Task 'train_test_track[15]': Finished task run for task with final state: 'TimedOut'
Obviously, the last log entry is only 2 seconds after the first, which seems like not much time for anything to time out. 🙂
c
Chris White
08/19/2021, 8:33 PM
Hey Joe! What version of Prefect are you running?
j
Joe Schmid
08/19/2021, 8:37 PM
Hi @Chris White! Unfortunately it's an old one: v0.13.19 😞
Joe Schmid
08/19/2021, 8:38 PM
We're definitely due for an upgrade...
Joe Schmid
08/19/2021, 8:42 PM
In case it's relevant, we recently enabled the use of Dask resources on our workers and are tagging these particular tasks with something like
tags=["dask-resource:CPU=1"]
c
Chris White
08/19/2021, 8:42 PM
No worries! OK I know what's going on (essentially) -- in older versions of Prefect all
states, and only Prefect-speciifc timeouts result in
TimedOut
states
So this suggests to me that something in this task is raising a
TimeoutError
really quickly and Prefect is capturing that and misinterpreting it
j
Joe Schmid
08/19/2021, 8:46 PM
@Chris White that makes sense. I'll see if we can catch any TimeoutError exceptions in our task and log the root cause. Thanks for the quick help and pointer to the task runner code!
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.