when running flows with the Kubernetes Agent, is prefect able to recover/restart a flow that has the...
a
when running flows with the Kubernetes Agent, is prefect able to recover/restart a flow that has the dask-scheduler pod killed by kubernetes? (for oom reasons or due to a spot node dying etc)
k
Hey @Aaron Ash, I believe Dask itself has mechanisms to spin up workers if some of them die. For Prefect to restart a Flow, I think both the Client and the Scheduler need to die. So Prefect has the Lazarus service. If you cluster does not even start, it will trigger. But if it is able to start the cluster, you need no submitted and running tasks for this to trigger (the flow lost communication entirely)