Thomas Opsomer01/20/2022, 2:48 PM
. Usually It happens in 2 situations: • the pod that run the tasks gets evicted / OOM killed • the pod was running on a preemptible node that gets removed and replaced. Is there something on the k8s agent, k8s job specification, or something else to configure to allow k8s to reschedule the job and let prefect know about it, so that the flow would continue ?!
No heartbeat detected...
will help here. Maybe use a DaskExecutor cuz Dask will restart the workers and retry the work?
Thomas Opsomer01/23/2022, 5:48 PM
is not helping 😕 Never used the DaskExecutor yet, I'll try it thanks !