https://prefect.io logo
t

Thomas Opsomer

01/20/2022, 2:48 PM
Hi, another Prefect + K8S question here šŸ™‚ Like the previous post we're frequently seeing the message
No heartbeat detected...
. Usually It happens in 2 situations: • the pod that run the tasks gets evicted / OOM killed • the pod was running on a preemptible node that gets removed and replaced. Is there something on the k8s agent, k8s job specification, or something else to configure to allow k8s to reschedule the job and let prefect know about it, so that the flow would continue ?!
k

Kevin Kho

01/20/2022, 3:45 PM
I am not sure
restartPolicy
will help here. Maybe use a DaskExecutor cuz Dask will restart the workers and retry the work?
t

Thomas Opsomer

01/23/2022, 5:48 PM
yes
restartPolicy
is not helping šŸ˜• Never used the DaskExecutor yet, I'll try it thanks !
5 Views