Hello prefect community 🙂,
We're using prefect with K8S. Sometime some tasks fail because the pods that run them get killed (for OOM or anything else, doesn't matter), which leads to a "pod missed heartbeat..." message.
The issue is that when a flow/task fails like this, the retry mechanism and the slack handler doesn't work. Is there a way to retry tasks on this kind of failure or/and to get any notification about the status ?
k
Kevin Kho
01/03/2022, 3:34 PM
Hi @Thomas Opsomer, so in this case, the compute literally dies so there is nothing that can retry that task. The notification part is doable though. You can use Automations in Prefect Cloud where Prefect will be the one responsible for sending an alert in the event it loses communication with you flow
t
Thomas Opsomer
01/03/2022, 3:46 PM
Ah ok, that makes sense !
For the notification part, is there a special "state" that defines the case where the communiation with the flow is lost ?
k
Kevin Kho
01/03/2022, 4:20 PM
No for current Prefect, but it will be in Orion (Prefect 2.0)!
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.