Leonard Marcq
09/13/2020, 10:02 AMAn error occurred (ThrottlingException) when calling the RunTask operation (reached max retries: 4): Rate exceeded.
). My issue is that those flow runs are marked as failed and never retried. I originally thought that Lazarus would retry failed flow runs, but it seems I misunderstood. Is there a recommended way of retrying failed flow runs that failed to even start?Jeremiah
09/13/2020, 12:16 PMLeonard Marcq
09/13/2020, 1:40 PMThe Lazarus process is meant to gracefully retry failures caused by factors outside of Prefect's control. The most common situations requiring Lazarus intervention are infrastructure issues, such as Kubernetes pods not spinning up or being deleted before they're able to complete a run.
I also wasn't clear on what "distressed" flow runs were. So I guess I will have to cook up something to retrieve the failed flow runs at some point and set_flow_run_state
to Scheduled
to restart them (as in https://docs.prefect.io/orchestration/concepts/flow_runs.html#graphql-2)Michael Ludwig
09/14/2020, 6:00 AMLeonard Marcq
09/14/2020, 6:25 AMJeremiah
09/14/2020, 3:43 PMLeonard Marcq
09/14/2020, 6:25 PMJeremiah
09/14/2020, 7:25 PMLeonard Marcq
09/16/2020, 8:17 PMJeremiah
09/16/2020, 8:29 PM