12/21/2021, 12:07 AM
Is there any good references for running prefect server in a high availability configuration on k8s? Our team is currently making use of the helm chart, but we were wondering about increasing replicas of some of the services (although initially we tried with prefect-agent, which resulted in the flows getting run twice), wondering if the other pods would be safe to run with multiple replicas? Thanks!

Anna Geller

12/21/2021, 12:41 AM
I would be curious to hear from other Server users. I believe that as long as your Server components and the agents are running as a deployment, then Kubernetes will ensure the desired state of your pods is maintained i.e. it will restart broken pods. The database would be more work, you could look at some cloud services such as AWS RDS or GCP Cloud Spanner that help manage scaling, availability and disaster recovery of the Postgres database.


12/21/2021, 12:47 AM
To add some more context, part of the original interest happened due to our autoscaler scaling down some nodes and evicting a pod before the new prefect ui and hasura pods came up, and we also got throttled on pulling down the images and the new pods got into a ImagePullBackOff state (granted the image pull rate limits are separate issue to solve), but thats why we were hoping to increase the number of replicas (and hopefully set up some pod disruption budgets)


12/21/2021, 9:37 AM
Did you set the liveness and readyness in your deployment? This ensure they the service will switch to the new pod only when they are up and running
As @Anna Geller the key of the high availability is the database. As well I find that Hazura could fail under heavy load.
I think however that you cannot have multiple instance of towel