Thomas Opsomer
01/20/2022, 2:48 PMNo heartbeat detected...
. Usually It happens in 2 situations:
⢠the pod that run the tasks gets evicted / OOM killed
⢠the pod was running on a preemptible node that gets removed and replaced.
Is there something on the k8s agent, k8s job specification, or something else to configure to allow k8s to reschedule the job and let prefect know about it, so that the flow would continue ?!Kevin Kho
restartPolicy
will help here. Maybe use a DaskExecutor cuz Dask will restart the workers and retry the work?Thomas Opsomer
01/23/2022, 5:48 PMrestartPolicy
is not helping š
Never used the DaskExecutor yet, I'll try it thanks !