I am running a flow via kubernetes job. Sometimes when there are not enough nodes available to run t...
z
I am running a flow via kubernetes job. Sometimes when there are not enough nodes available to run the job pods, it will take a sec to scale up a new node. During this time, prefect seems to mark the flow as crashed (since it was not scheduled after some time), but eventually the new node does come up and the flow is able to run fine. However, prefect refuses to run the flow since since the run has already been marked as terminated:
aborted by orchestrator: This run has already terminated.
Is there some way I can configure the internal timeout for waiting for the pod to be scheduled? Configuring retries does not seem to make a difference. Thanks!
1
t
Most likely I believe you're looking for
pod_watch_timeout_seconds
on the
KubernetesJob
, which is how long a Pod gets to go from
Pending
to any other state before the flow gets marked as crashed. Defaults to 60 seconds. https://docs.prefect.io/api-ref/prefect/infrastructure/#prefect.infrastructure.KubernetesJob
1
🙌 1
z
Thank you so much! I will try this shortly!
It worked! thanks! 🙂
🙌 2
425 Views