Thành Đạt Vũ
09/15/2025, 2:18 AMMarvin
09/15/2025, 2:18 AMMarvin
09/15/2025, 2:27 AMPREFECT_CLIENT_MAX_RETRIES=10
- PREFECT_CLIENT_RETRY_EXTRA_CODES=502,503,504
(ensures these 5xx are retried)
- PREFECT_API_REQUEST_TIMEOUT=120
- Optional: PREFECT_API_ENABLE_HTTP2=true
(can reduce connection overhead)
- Reduce API chatter from logs
- Temporarily disable to test: PREFECT_LOGGING_TO_API_ENABLED=false
- Or batch more conservatively:
- PREFECT_LOGGING_TO_API_BATCH_INTERVAL=5
- PREFECT_LOGGING_TO_API_BATCH_SIZE=1000000
- PREFECT_LOGGING_TO_API_MAX_LOG_SIZE=200000
- Smooth concurrency spikes
- Cap pool concurrency so runs don’t all start at once:
- prefect work-pool set-concurrency-limit "<pool>" N
- Stagger large backfills or bursts of scheduled runs.
- Check for network intermediaries
- Corporate proxies/NATs terminating connections or aggressive idle timeouts can surface as 503s; try keeping connections short-lived or enabling HTTP/2, and verify the proxy logs.
How to apply settings
- As environment variables (recommended for workers/infra):
export PREFECT_CLIENT_MAX_RETRIES=10
export PREFECT_CLIENT_RETRY_EXTRA_CODES=502,503,504
export PREFECT_API_REQUEST_TIMEOUT=120
export PREFECT_API_ENABLE_HTTP2=true
export PREFECT_LOGGING_TO_API_ENABLED=false
- Or in code for a single run:
from prefect.settings import temporary_settings
from prefect.settings import (
PREFECT_CLIENT_MAX_RETRIES,
PREFECT_CLIENT_RETRY_EXTRA_CODES,
PREFECT_API_REQUEST_TIMEOUT,
PREFECT_API_ENABLE_HTTP2,
PREFECT_LOGGING_TO_API_ENABLED,
)
with temporary_settings({
PREFECT_CLIENT_MAX_RETRIES: 10,
PREFECT_CLIENT_RETRY_EXTRA_CODES: {502, 503, 504},
PREFECT_API_REQUEST_TIMEOUT: 120.0,
PREFECT_API_ENABLE_HTTP2: True,
PREFECT_LOGGING_TO_API_ENABLED: False,
}):
# run your flow
...
Helpful references
- Prefect settings overview: Settings
- Logging to API and batching: Logging
- Work pools and concurrency: Work Pools
- Prefect Cloud status (in case there’s an ongoing incident): Status Page
If you can share:
- A couple of failing flow run IDs and timestamps
- Whether disabling log upload reduced 503s
- Your Prefect version and region (for Cloud)
…I can dig deeper and help pinpoint whether it’s client-side load, network, or something we should escalate on our side.