I am running a flow via kubernetes job. Sometimes when there are not enough nodes available to run the job pods, it will take a sec to scale up a new node. During this time, prefect seems to mark the flow as crashed (since it was not scheduled after some time), but eventually the new node does come up and the flow is able to run fine. However, prefect refuses to run the flow since since the run has already been marked as terminated:
aborted by orchestrator: This run has already terminated.
Is there some way I can configure the internal timeout for waiting for the pod to be scheduled? Configuring retries does not seem to make a difference. Thanks!
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.