Hi ! I have search the docs for an answer but coul...
# prefect-server
s
Hi ! I have search the docs for an answer but could not find much so I thought I would ask here. How does the Prefect engine deal with submitted
KubernetesRun
based flows that remain
pending
for some reasons. For example, what happens if I try to submit a flow but there isn't enough resources available on my cluster at that moment ? From my experience, I can see that those flows get re-submitted and another pod is created after some time but what happens then ? Both pods will run if given the resources ? Is there a limit after which the engine kills the flow run because of being unable to run it properly ?
a
@Sylvain Hazard my understanding is that Lazarus is responsible for such use cases. It gracefully retries failures caused by factors outside of Prefect’s control - Kubernetes pods not spinning up due to resource constraints on a node is a great example of that. Once every 10 min, Lazarus searches for distressed flow runs and reschedules them (you could see this in the flow run’s logs). Scheduled flow runs without submitted or running task runs will be rescheduled up to 10 times - the 11th time the flow run is marked as failed.
s
Thanks ! That is a pretty clean process, I like it !
🙌 1
Weirdly enough, it looks like my Lazarus kills flow runs after only 3 retry attempts, is it something that's configurable ?
I got something like this.
a
I see, let me check that
s
Thanks
a
yup, looks like for Server 3 is actually the default - this is from config.toml:
Copy code
[services.lazarus]
    resurrection_attempt_limit = 3
the whole section in config.toml is called services:
Copy code
[services]

    host = "0.0.0.0"

    [services.apollo]
    host = "${services.host}"
    port = 4200

    [services.graphql]
    host = "${services.host}"
    port = 4201
    debug = false
    path = "/graphql/"
    disable_access_logs = false
    timeout_keep_alive = 5

    [services.lazarus]
    resurrection_attempt_limit = 3

    [services.towel]
    max_scheduled_runs_per_flow = 10
s
Nice, thanks a lot !