j

    Joe Schmid

    1 year ago
    Running into an issue where some mapped tasks fail with state TimedOut without an exception or apparent failures in the task run logs. Are there specific conditions that would cause a task to end in TimedOut state? (Feel free to point me to docs or code if needed.) Example log output in thread reply:
    16:21:50
    INFO
    CloudTaskRunner
    Task 'train_test_track[15]': Starting task run...
    16:21:50
    INFO
    CloudTaskRunner
    FeatureSpace metadata saved to: /home/efs/featurespace/data/XXXXXXXX/metadata/XXXXXX
    FeatureSet data within batch folder: XXXXXXXX
    16:21:52
    INFO
    CloudTaskRunner
    Task 'train_test_track[15]': Finished task run for task with final state: 'TimedOut'
    Obviously, the last log entry is only 2 seconds after the first, which seems like not much time for anything to time out. 🙂
    Chris White

    Chris White

    1 year ago
    Hey Joe! What version of Prefect are you running?
    j

    Joe Schmid

    1 year ago
    Hi @Chris White! Unfortunately it's an old one: v0.13.19 😞
    We're definitely due for an upgrade...
    In case it's relevant, we recently enabled the use of Dask resources on our workers and are tagging these particular tasks with something like
    tags=["dask-resource:CPU=1"]
    Chris White

    Chris White

    1 year ago
    No worries! OK I know what's going on (essentially) -- in older versions of Prefect all
    TimeoutError
    s from tasks are converted to
    TimedOut
    states, not just those that use prefect timeouts: https://github.com/PrefectHQ/prefect/blob/0.13.19/src/prefect/engine/task_runner.py#L864-L868 In newer versions, TimeoutErrors convert to standard
    Failed
    states, and only Prefect-speciifc timeouts result in
    TimedOut
    states So this suggests to me that something in this task is raising a
    TimeoutError
    really quickly and Prefect is capturing that and misinterpreting it
    j

    Joe Schmid

    1 year ago
    @Chris White that makes sense. I'll see if we can catch any TimeoutError exceptions in our task and log the root cause. Thanks for the quick help and pointer to the task runner code!
    Chris White

    Chris White

    1 year ago
    yup anytime!