Graceful shutdown of jobs in k8s
We’re using Prefect to run flows of varying lengths on a k8s cluster with autoscaling.
Our current approach is to try and let flows complete before terminating a node (setting a very long grace period for the job).
However, it seems like
capture_sigterm
in prefect.engine aborts the task without any option for overriding it, so the flow ends up dying the moment the node is marked for termination.
Did anyone successfully implement something similar? any ideas?
Omri Ildis
01/14/2024, 10:46 AM
note: overriding the signal from within the flow isn’t always possible since sometimes it’s not ran on the main thread
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.