Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

Hi all, we are interested in using Prefect for ETL jobs at my firm. These particular ETL jobs run daily, have strict deadlines, and are considered critical for our business. So with that, I have some questions around the reliability of prefect server. Essentially, how can we avoid Prefect server being a single point of failure for the orchestration of flows? I appreciate there's probably a lot of approaches here, but as a scenario, let's suppose our server running prefect server had a hardware failure during a critical period - what could we do to mitigate this?

If this is critical for your business, I would highly encourage you to use Prefect Cloud since it’s highly available and continuously monitored by us.

But if you want to use Server for this, then deploying it to a Kubernetes cluster managed by some cloud provider (e.g. AWS/GCP) could help since you can use Kubernetes service to automatically restart failed components and the cloud provider can ensure that the hardware is reliable and scales (e.g. restarting failed compute nodes)

This <https://prefect-community.slack.com/archives/CL09KU1K7/p1641990862382900?thread_ts=1641976419.380600&amp;cid=CL09KU1K7|thread> might be helpful as well