Hello! Having an issue with Kubernetes cluster au...
# prefect-community
Hello! Having an issue with Kubernetes cluster autoscaler for long running (> 21 min) flows similar to issue 3058. I noticed the issue has been marked "closed". What was the solution?
👀 1
Hi Johnny, the issue here is that the autoscaler will sometimes kill active jobs, while prefect (currently) doesn't always like to have its jobs killed mid-run. The fix for you would be to modify the job template running on your agent to add the
Copy code
"<http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>": "false"
label to the job template (in
). This will prevent the autoscaler from evicting active jobs.
To update the agent template, you can copy the default template (https://github.com/PrefectHQ/prefect/blob/master/src/prefect/agent/kubernetes/job_spec.yaml), add the labels (and whatever else you want), save it in the same environment that the agent is running in, then point the agent to it by setting the
environment variable to where the template is located. I recognize that this is a bit complicated, we're currently working to simplify deployment configuration to make customizing deployments a lot simpler (see https://github.com/PrefectHQ/prefect/pull/3333).
thank you! very very helpful info. I've been stuck on this for 2 days 🙂
👍 1
Feel free to reach out sooner next time, we're always happy to help. Hope the above tips work for you :).
will do!