Caleb
11/07/2024, 7:32 PMrun_deployment
retry if the initial flow run or sub flow runs crashed?Marvin
11/07/2024, 7:33 PMrun_deployment
. However, it appears there can be some intricacies involved, especially with subflows and infrastructure like Kubernetes.
1. Retries: Prefect allows for retries to be configured on individual tasks and flows, which can be utilized when a flow run or a subflow crashes. You would typically configure retries in the flow definition itself, which applies to any runs, regardless of them being triggered manually or through a deployment.
2. Challenges with Subflows: According to a GitHub issue here, there's a known behavior where subflows triggered by run_deployment
might be marked as failed if they crash, even if they retry and succeed later. This might lead the parent flow to also be marked as failed incorrectly. So, while the subflow might succeed after a retry, its initial crash might still reflect a failure on the parent flow.
3. Infrastructure-Specific Handling: For instance, if you're using Kubernetes or Docker, your infrastructure setup (e.g., Kubernetes eviction policies) might retry on its own, but Prefect might not be immediately aware of this retry. This could lead to a flow run being perceived as failed even if the infrastructure retries and succeeds.
For more intricate handling or if you encounter specific issues with the retries not behaving as expected, I suggest looking into Prefect's ability to manage flow states and leveraging some of the related API operations.
Be sure to monitor existing issues and discussions for updates or known workarounds, especially here.
So, while you can set retries, there might be some quirks, especially with subflow handling. It's like teaching a cat new tricks—it can work, but may not be entirely straightforward.