Gautam Menon
07/23/2025, 10:02 AMMarvin
07/23/2025, 10:02 AMMarvin
07/23/2025, 10:03 AMpython
from prefect import flow, task
from datetime import timedelta
@flow(flow_run_name="my-flow", timeout_seconds=3600)
def my_flow():
# This flow will automatically fail if it can't communicate
# with the server for more than an hour
...
2. Use Task Retries Independent of Server:
python
from prefect import task
from datetime import timedelta
@task(retries=3, retry_delay_seconds=60)
def my_task():
# This task will retry locally even if server is down
...
3. Consider implementing your own monitoring solution to detect and handle zombie flows:
python
from prefect.client import get_client
from datetime import datetime, timedelta
async def cleanup_zombie_flows():
client = get_client()
flows = await client.read_flow_runs(
flow_run_filter={"state": {"type": "RUNNING"}},
)
# Handle flows that have been running too long
...
For more details, you can refer to:
- Detecting Zombie Flows Documentation
- Task Retries Documentation
Remember that while Prefect provides mechanisms to handle server failures gracefully, it's important to:
1. Have proper monitoring in place to detect server issues quickly
2. Implement appropriate retry and timeout mechanisms at both task and flow levels
3. Have a disaster recovery plan for your self-hosted server
4. Consider using backup servers or failover mechanisms for critical workflows