Running into an issue where some mapped tasks fail...
# ask-community
j
Running into an issue where some mapped tasks fail with state TimedOut without an exception or apparent failures in the task run logs. Are there specific conditions that would cause a task to end in TimedOut state? (Feel free to point me to docs or code if needed.) Example log output in thread reply:
Copy code
16:21:50
INFO
CloudTaskRunner
Task 'train_test_track[15]': Starting task run...
16:21:50
INFO
CloudTaskRunner
FeatureSpace metadata saved to: /home/efs/featurespace/data/XXXXXXXX/metadata/XXXXXX
FeatureSet data within batch folder: XXXXXXXX
16:21:52
INFO
CloudTaskRunner
Task 'train_test_track[15]': Finished task run for task with final state: 'TimedOut'
Obviously, the last log entry is only 2 seconds after the first, which seems like not much time for anything to time out. 🙂
c
Hey Joe! What version of Prefect are you running?
j
Hi @Chris White! Unfortunately it's an old one: v0.13.19 😞
We're definitely due for an upgrade...
In case it's relevant, we recently enabled the use of Dask resources on our workers and are tagging these particular tasks with something like
tags=["dask-resource:CPU=1"]
c
No worries! OK I know what's going on (essentially) -- in older versions of Prefect all
TimeoutError
s from tasks are converted to
TimedOut
states, not just those that use prefect timeouts: https://github.com/PrefectHQ/prefect/blob/0.13.19/src/prefect/engine/task_runner.py#L864-L868 In newer versions, `TimeoutError`s convert to standard
Failed
states, and only Prefect-speciifc timeouts result in
TimedOut
states So this suggests to me that something in this task is raising a
TimeoutError
really quickly and Prefect is capturing that and misinterpreting it
j
@Chris White that makes sense. I'll see if we can catch any TimeoutError exceptions in our task and log the root cause. Thanks for the quick help and pointer to the task runner code!
c
yup anytime!