Theom
08/21/2025, 1:05 PMLate
state and once it is 1h 59m late, it is suddenly running. Any idea to find to root cause of this delay?Marvin
08/21/2025, 1:05 PMMarvin
08/21/2025, 1:20 PMflow.serve(...)
) has a clock that’s about 2 hours behind your Prefect API server.
Why that explains your symptoms
- In Prefect 3.x, a flow run is marked Late when its scheduled start time has passed and it hasn’t started yet. The “late” marking happens quickly (default ~15 seconds), not hours later. See Late runs service.
- The process that executes your run will only start a run once its local clock reaches the scheduled time (with a tiny prefetch window). If that machine’s time is -2h, it won’t pick up “start now” runs until its clock catches up — from the API’s perspective the run sits Late for ~2 hours, then suddenly starts.
- The very consistent ~1h 59m delay is a classic sign of clock skew rather than a Prefect “retry” or rescheduler.
There isn’t a Prefect feature that intentionally waits ~2 hours, and the Late marking threshold is not 2 hours. The relevant setting is the Late marking threshold (defaults to seconds, not hours):
- PREFECT_API_SERVICES_LATE_RUNS_AFTER_SECONDS (defaults ~15s)
- Docs: Settings reference
How to confirm
1) Compare clocks on both hosts:
- On the API server host/container:
- date -u
- On the machine/container running your serve
process:
- date -u
- If you’re in Docker/Kubernetes, exec into the containers to check time inside them.
- If they differ by ~2h, that’s the cause.
2) Optional: raise logging and watch scheduling windows
- Set PREFECT_LOGGING_LEVEL=DEBUG
on the process running your served flow, then trigger a run and watch for logs that show it only considering runs scheduled before its local “now + prefetch”.
3) Optional: log scheduled vs actual at run start to confirm skew
from prefect import flow, get_run_logger
import pendulum
import prefect
@flow
def debug_time():
logger = get_run_logger()
<http://logger.info|logger.info>(f"Scheduled start: {prefect.runtime.flow_run.scheduled_start_time()}")
<http://logger.info|logger.info>(f"Local UTC now: {pendulum.now('utc').isoformat()}")
# run this as a quick served flow to compare once it starts
How to fix
- Sync time on both machines (enable/repair NTP/chrony).
- Ensure container clocks are correct if you’re running the API or served process in containers.
- After clocks are in sync, “Run” from the UI should start immediately (or within the small prefetch window) instead of going Late for ~2 hours.
If clocks are already in sync
If you confirm the clocks match and the issue persists, please share:
- Your Prefect version
- Where the API runs vs where serve
runs (hosts/containers/cloud region)
- A screenshot or the timestamps from a run showing:
- Scheduled time
- Time it turned Late
- Time it actually started
- Any custom settings you’ve set for polling/prefetch or late-run services
References
- Late runs service: API: late runs
- Settings reference: Settings ref