when running flows with the Kubernetes Agent, is prefect able to recover/restart a flow that has the dask-scheduler pod killed by kubernetes? (for oom reasons or due to a spot node dying etc)
k
Kevin Kho
09/30/2021, 5:22 AM
Hey @Aaron Ash, I believe Dask itself has mechanisms to spin up workers if some of them die. For Prefect to restart a Flow, I think both the Client and the Scheduler need to die.
So Prefect has the Lazarus service. If you cluster does not even start, it will trigger. But if it is able to start the cluster, you need no submitted and running tasks for this to trigger (the flow lost communication entirely)
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.