Anyone know why i might be suddenly getting this e...
# prefect-kubernetes
t
Anyone know why i might be suddenly getting this error and having flows deployed on k8 work pool workers fail? 🙏
Copy code
Job 'ultraviolet-lyrebird-vhjs7': Job reached backoff limit.
g
default sets the no backoff limit, this means if your job crashes, which this one did most likely because of resource ie OOM. K8s says there are no more restarts. I would look into why it crashed, you can manually update the k8s config to add retries, but odds it will continue to crash
t
Thanks for the pointer! I think you're right
w
I've also noticed a bug in prefect, where our flows will run successfully, and then at the end of the flow, the worker marks the flow as failed with the same error. The flow has run successfully though. If you just increase your backoff limit setting above 0 (the default in the base job template), then it will work.
g
👍