Kevin Wang

11/17/2022, 7:14 PM
I find myself running scripts that set 'stuck' flow runs to Cancelled or Crashed.. Is there some more automatic way to recover from that? It can cause jams, because stuck jobs eat up queue concurrency. Let me know if Discourse or elsewhere is the right place to ask. I also found this Github issue.. how do people handle this in cases of crashed Kubernetes or ECS tasks?
I think these are similar to the issues raised here, But I'm mostly asking about:
if a pod is terminated, the flow status remains Running indefinitely
Is there an automated way to recover now?