Hui Zheng

10/09/2020, 12:01 AM
hello, Prefect, we have a scheduled flow that runs on k8e agent. Sometimes, the run started 10 mins later than its scheduled time. It seems related to some irresponsiveness of the tasks and The prefect-server has attempted
Copy code
Rescheduled by a Lazarus process.
For example, the one in the screenshot is scheduled for 11:10, but actually didn’t start until 11:20. Could anyone help understand why this happen and how to prevent it? Because we are building a new flow which need to run every 10 minutes with a very strict SLA. a 10 minute delay would be fatal to the new flow. Thank you

Thomas Hoeck

10/09/2020, 10:20 AM
Are you running it on AKS? @Hui Zheng THe reason I'm asking is that there is a bug on AKS with in-clusters coms that yields this error.


10/09/2020, 12:33 PM
Hi @Hui Zheng - this sounds infrastructure related; could you give some more information about your flow's environment, executor, your k8s cluster etc?

Hui Zheng

10/13/2020, 7:19 PM
@Thomas Hoeck @nicholas we run it on Google GKE.