Sometimes, one of our flows will fail with a CRASHED state. We've learned that this is a Prefect infrastructure error, so retries will not happen. However, we need to figure out how to detect a crash and resend the flow. Any suggestions? For what it's work, we're seeing
Flow run infrastructure exited with non-zero status code -9.
with the crash.
n
Nate
06/08/2023, 4:41 PM
hmm, do you have dynamic flow run params for this flow? if not, it should be as simple as making an automation that:
• trigger: on flow run enter Crashed
• action: Run Deployment > Infer Deployment
🙌 1
Nate
06/08/2023, 4:42 PM
I will raise this internally either way, since it would be nice for Infer Deployment to re-use flow run params in this case
z
Zanie
06/08/2023, 5:28 PM
What type of infrastructure are you using?
s
Sean Conroy
06/08/2023, 5:53 PM
OK thanks for the response. I was not aware of those triggers and actions. We are using GCP and Kubernetes.
t
Tom Klein
09/04/2023, 3:57 PM
this seems to be the same issue mentioned here:
https://prefect-community.slack.com/archives/CL09KU1K7/p1688664433269099
always firing the same deployment upon a crash could create an infinite loop in case (for example) that the crash is due to an inherent problem with the job (like it going
OOM
for example)
there needs to be some more robust solution….
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.