https://prefect.io logo
o

Olivér Atanaszov

04/13/2022, 9:19 AM
Hi, I have a scheduled flow run that had a failed run during the night (probably due to transient network issues). Although Lazarus is enabled for this flow, I did not see any logs reflecting that, see https://docs.prefect.io/orchestration/concepts/services.html#lazarus. Do I miss something?
s

Sylvain Hazard

04/13/2022, 9:23 AM
Hey ! I think the effect of Lazarus depends on where the issue comes from. If the network issue affects a
requests
call in your flow code or any equivalent, it will raise an exception and fail the flow. On the other hand, if the flow doesn't start at all or stops responding, Lazarus will reschedule it after a while. The most frequent example of this for me is when trying to run a flow when our k8s cluster does not have enough resources available to provision the flow pod. The pod will stay pending and after 10 minutes, Lazarus will try to launch another pod because the first one isn't ready yet. Hope this helps 🙂
💯 1
o

Olivér Atanaszov

04/13/2022, 9:24 AM
thanks, that makes complete sense 👍
s

Sylvain Hazard

04/13/2022, 9:27 AM
To fix the issue you're facing, you might want to have your tasks have retries enabled by default on any task that relies on external dependencies (be it an API, a database, etc.). Actual configuration depends on your specific use case though.
6 Views