Functionally, the agent is responsible for monitoring work - what happens in Kubernetes is that the pod might crash, then the job condition is not satisfied, and is retried by the cluster. This new pod then attempts to re-execute the entrypoint. I think what you are referring to is two-fold:
the case of a crashed pod that is an infrastructure event
the case of a failed flow that is code-based and not necessarily an infrastructure event