Hi, We had a task failure (it seems a task run tim...
# prefect-community
h
Hi, We had a task failure (it seems a task run time-out) last night. it seems prefect didn’t retry this tasks when it failed. We have to resolve to a manual recover this morning. Could you recommend a solution to allow prefect to handle this failure itself? (see error details in thread)
error log screenshot.
j
Hi @Hui Zheng - right now when a failure takes place that kills the Prefect process, we fail the task and don’t attempt a retry via the Zombie Killer. We’re going to look at relaxing that policy to follow the retry instruction on the task.
h
Thank you, Jeremiah. That would be great. If possible, please keep me posted when the retry instruction becomes available. Also, do you know what kind of failures could trigger this? How could I found more info about the failure in the prefect-cloud dash?
j
Generally, zombie-killed failures mean that Prefect Cloud lost communication with the task, and therefore they are hard to diagnose. They usually represent a crash of the process or node the task was running on.
h
@Jeremiah Hi, we experienced another incident of the same failure last night. I just want to follow up to see if prefect would implement the relaxation of policy to follow the retry instruction on the task when it is being killed by the Zombie Killer.
j
Hi @Hui Zheng, sorry for not following up directly - this is on the near-term roadmap and I expect it to show up in Cloud in the coming weeks.
h
That’s great. where could I check so that I could get notification when it is released ?
j
We are working on the best way to expose our roadmap publicly 😅 but I’ll try to come back to let you know here!
👍 1