I find myself running scripts that set 'stuck' flow runs to Cancelled or Crashed.. Is there some more automatic way to recover from that? It can cause jams, because stuck jobs eat up queue concurrency. Let me know if Discourse or elsewhere is the right place to ask. I also found this Github issue.. how do people handle this in cases of crashed Kubernetes or ECS tasks?
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.