Running into an issue where some mapped tasks fail with state TimedOut without an exception or appar...

Joe Schmid

08/19/2021, 8:30 PM

Running into an issue where some mapped tasks fail with state TimedOut without an exception or apparent failures in the task run logs. Are there specific conditions that would cause a task to end in TimedOut state? (Feel free to point me to docs or code if needed.) Example log output in thread reply:

Joe Schmid

08/19/2021, 8:32 PM

Copy code

16:21:50
INFO
CloudTaskRunner
Task 'train_test_track[15]': Starting task run...
16:21:50
INFO
CloudTaskRunner
FeatureSpace metadata saved to: /home/efs/featurespace/data/XXXXXXXX/metadata/XXXXXX
FeatureSet data within batch folder: XXXXXXXX
16:21:52
INFO
CloudTaskRunner
Task 'train_test_track[15]': Finished task run for task with final state: 'TimedOut'

Obviously, the last log entry is only 2 seconds after the first, which seems like not much time for anything to time out. 🙂

Chris White

08/19/2021, 8:33 PM

Hey Joe! What version of Prefect are you running?

Joe Schmid

08/19/2021, 8:37 PM

Hi @Chris White! Unfortunately it's an old one: v0.13.19 😞

Joe Schmid

08/19/2021, 8:38 PM

We're definitely due for an upgrade...

Joe Schmid

08/19/2021, 8:42 PM

In case it's relevant, we recently enabled the use of Dask resources on our workers and are tagging these particular tasks with something like

tags=["dask-resource:CPU=1"]

Chris White

08/19/2021, 8:42 PM

No worries! OK I know what's going on (essentially) -- in older versions of Prefect all

TimeoutError

s from tasks are converted to

TimedOut

states, not just those that use prefect timeouts: https://github.com/PrefectHQ/prefect/blob/0.13.19/src/prefect/engine/task_runner.py#L864-L868 In newer versions, `TimeoutError`s convert to standard

Failed

states, and only Prefect-speciifc timeouts result in

TimedOut

states So this suggests to me that something in this task is raising a

TimeoutError

really quickly and Prefect is capturing that and misinterpreting it

Joe Schmid

08/19/2021, 8:46 PM

@Chris White that makes sense. I'll see if we can catch any TimeoutError exceptions in our task and log the root cause. Thanks for the quick help and pointer to the task runner code!

Chris White

08/19/2021, 8:46 PM

yup anytime!

13 Views

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.