How are others handling the occasional ECS/Fargate error
Timeout waiting for network interface provisioning to complete.
We have created an AWS support request asking if there is a way for ECS to automatically retry the task. (Or increase the timeout might be helpful too!)
We are not using the prefect lazarus process.
03/15/2022, 1:59 PM
There was another thread that reported that and unfortunately it seems like there is not much that can be done because the AWS docs literally says to restart it manually. On the Prefect side, you could try an automation?
03/16/2022, 1:03 PM
@Kevin Kho Reading the automation docs, an automation can only respond to a change in prefect state. The problem is, prefect shows the task as "running". Which means prefect will not be able to respond to a change in state. Correct me if I'm wrong.
So we chatted internally about this last night and this has to do with the Zombie Killer currently being unreliable so we’re working on a fix for that to make it better so that this failure will be detected