For sure! this type of error is something that definitely a little elusive in that it exists somewhat outside of the immediate control that prefect has.
For example, say a push pool work pool gets an API error attempting to hit the create ECS task run endpoint. That is very much within the domain of a push work pool trying and if theres a transient network error we obscure that from the user and will try again.
In this case the failure is between ECS and another external system. Prefect is just reporting a failure that it has observed.
I totally understand that a user though just wants things to work and not fail! While prefect is taking the first step in reporting that failure to the user, theres no automatic action because ultimately we don't really know what the failure is or how to remedy.
Thanks again for flagging this though. Your feedback is always really welcome and important! I will for sure give some more thought about how this might fit into things retrying automatically