Hi, another Prefect + K8S question here š
Like the previous post we're frequently seeing the message
No heartbeat detected...
. Usually It happens in 2 situations:
⢠the pod that run the tasks gets evicted / OOM killed
⢠the pod was running on a preemptible node that gets removed and replaced.
Is there something on the k8s agent, k8s job specification, or something else to configure to allow k8s to reschedule the job and let prefect know about it, so that the flow would continue ?!
k
Kevin Kho
01/20/2022, 3:45 PM
I am not sure
restartPolicy
will help here. Maybe use a DaskExecutor cuz Dask will restart the workers and retry the work?
t
Thomas Opsomer
01/23/2022, 5:48 PM
yes
restartPolicy
is not helping š
Never used the DaskExecutor yet, I'll try it thanks !
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.