< Marvin> I m seeing this task fail and then not retry any i Prefect Community #ask-marvin

<@ULVA73B9P> I'm seeing this task fail and then no...

Owen Boyd

09/04/2025, 10:26 PM

@Marvin I'm seeing this task fail and then not retry - any idea what might be going on or how I can troubleshoot? The failure is likely caused by a downstream resource being overloaded. I'm confused why we are not seeing any retry attempts half an hour later. I suspected that the prefect run might have crashed, but it's still going according to the cloud UI. Nothing else that I'm aware of would prevent the retry from happening - I checked task concurrency limits etc, we don't have any internal concurrency limits that would prevent retry here in our code.

Copy code

Task run failed with exception: TaskRunTimeoutError('Scope timed out after 60.0 second(s).') - Retry 1/3 will start 10 second(s) from now 02:29:37 PM
Finished in state Completed() 02:29:22 PM

Marvin

09/04/2025, 10:26 PM

✅ thought for 265.4 seconds

Marvin

09/04/2025, 10:31 PM

Thanks for the details — that does look confusing. A very common cause of “TaskRunTimeoutError … will retry in 10s” followed by “Finished in state Completed()” with no visible retry is cooperative timeouts. In Prefect 3.x,

timeout_seconds

cancels the task’s execution scope, but it cannot forcibly kill blocking work inside your function (e.g., a requests/DB call without its own timeout). Typical symptoms: - The first attempt times out and logs that a retry is scheduled - The underlying blocking call keeps running in the background thread - The retry cannot actually start until the first attempt yields control - If the blocking call eventually returns successfully, the task run may get marked Completed and you never see a retry execute Given your note about a downstream resource being overloaded, this fits pretty well. What to check next - In the UI, open the specific Task Run and review the State History. Do you see “AwaitingRetry” with a scheduled time, then no subsequent “Running” attempt? That indicates the first attempt never really yielded. - Confirm your task configuration:

retries

retry_delay_seconds

, and

timeout_seconds

on the

@task

decorator. Also check for

retry_condition_fn

or state hooks that could change behavior. - Ensure your task code is not catching and swallowing the timeout/cancellation. Broad

except Exception:

around the body can convert a failure into a normal return, which will mark the task Completed. - Turn on debug logs to capture state transitions and scheduling: set

PREFECT_LOGGING_LEVEL=DEBUG

in the environment for the run. How to make timeouts and retries reliable with blocking I/O - Add explicit timeouts to external calls so they fail quickly and cooperatively: - requests:

Copy code

import requests
    # (connect_timeout, read_timeout) seconds
    resp = requests.get(url, timeout=(5, 60))

- Database drivers: set statement/query timeouts at the driver/session level. - If the external client supports cancellation/interrupts, use those APIs. - For truly non-cooperative work, run it in a subprocess that you can terminate on timeout, then let the task retry. A small wrapper function invoked via

subprocess

or a worker pool you can cancel is often the most reliable for misbehaving calls. Quick sanity checklist - Is your task declared with something like:

Copy code

from prefect import task

  @task(retries=3, retry_delay_seconds=10, timeout_seconds=60)
  def my_task(...):
      ...

- Are you invoking it normally from a flow (no custom asyncio task juggling, no manual futures you don’t await)? - Do you see “AwaitingRetry” in the Task Run’s state timeline in the UI? If you can share: - Your Prefect version (

prefect version

) - The task decorator line (showing retries/timeout config) - How you call the task inside the flow - Any external call(s) inside the task (e.g., requests/DB code) I can help pinpoint the exact cause and recommend the cleanest fix. References - Task options (retries, timeouts) API signature: Prefect 3 task API - Task engine source (timeout and retry handling): prefect/task_engine.py - Prefect 3 release notes: Prefect 3 release notes

😐 1

6 Views

Open in Slack

Previous Next