<@ULVA73B9P> I'm seeing this task fail and then no...
# ask-marvin
o
@Marvin I'm seeing this task fail and then not retry - any idea what might be going on or how I can troubleshoot? The failure is likely caused by a downstream resource being overloaded. I'm confused why we are not seeing any retry attempts half an hour later. I suspected that the prefect run might have crashed, but it's still going according to the cloud UI. Nothing else that I'm aware of would prevent the retry from happening - I checked task concurrency limits etc, we don't have any internal concurrency limits that would prevent retry here in our code.
Copy code
Task run failed with exception: TaskRunTimeoutError('Scope timed out after 60.0 second(s).') - Retry 1/3 will start 10 second(s) from now 02:29:37 PM
Finished in state Completed() 02:29:22 PM
m
thought for 265.4 seconds
Thanks for the details — that does look confusing. A very common cause of “TaskRunTimeoutError … will retry in 10s” followed by “Finished in state Completed()” with no visible retry is cooperative timeouts. In Prefect 3.x,
timeout_seconds
cancels the task’s execution scope, but it cannot forcibly kill blocking work inside your function (e.g., a requests/DB call without its own timeout). Typical symptoms: - The first attempt times out and logs that a retry is scheduled - The underlying blocking call keeps running in the background thread - The retry cannot actually start until the first attempt yields control - If the blocking call eventually returns successfully, the task run may get marked Completed and you never see a retry execute Given your note about a downstream resource being overloaded, this fits pretty well. What to check next - In the UI, open the specific Task Run and review the State History. Do you see “AwaitingRetry” with a scheduled time, then no subsequent “Running” attempt? That indicates the first attempt never really yielded. - Confirm your task configuration:
retries
,
retry_delay_seconds
, and
timeout_seconds
on the
@task
decorator. Also check for
retry_condition_fn
or state hooks that could change behavior. - Ensure your task code is not catching and swallowing the timeout/cancellation. Broad
except Exception:
around the body can convert a failure into a normal return, which will mark the task Completed. - Turn on debug logs to capture state transitions and scheduling: set
PREFECT_LOGGING_LEVEL=DEBUG
in the environment for the run. How to make timeouts and retries reliable with blocking I/O - Add explicit timeouts to external calls so they fail quickly and cooperatively: - requests:
Copy code
import requests
    # (connect_timeout, read_timeout) seconds
    resp = requests.get(url, timeout=(5, 60))
- Database drivers: set statement/query timeouts at the driver/session level. - If the external client supports cancellation/interrupts, use those APIs. - For truly non-cooperative work, run it in a subprocess that you can terminate on timeout, then let the task retry. A small wrapper function invoked via
subprocess
or a worker pool you can cancel is often the most reliable for misbehaving calls. Quick sanity checklist - Is your task declared with something like:
Copy code
from prefect import task

  @task(retries=3, retry_delay_seconds=10, timeout_seconds=60)
  def my_task(...):
      ...
- Are you invoking it normally from a flow (no custom asyncio task juggling, no manual futures you don’t await)? - Do you see “AwaitingRetry” in the Task Run’s state timeline in the UI? If you can share: - Your Prefect version (
prefect version
) - The task decorator line (showing retries/timeout config) - How you call the task inside the flow - Any external call(s) inside the task (e.g., requests/DB code) I can help pinpoint the exact cause and recommend the cleanest fix. References - Task options (retries, timeouts) API signature: Prefect 3 task API - Task engine source (timeout and retry handling): prefect/task_engine.py - Prefect 3 release notes: Prefect 3 release notes
😐 1