< Marvin> Occasionally on very long running flows I get this Prefect Community #marvin-in-the-wild

<@ULVA73B9P> Occasionally, on very long running fl...

Javier SG

09/20/2024, 6:31 AM

@Marvin Occasionally, on very long running flows, I get this error. Do you have any idea of what could go wrong, or how else could I debug it? The error goes away when the flow is restarted.

Copy code

Flow run infrastructure exited with non-zero status code:
 Container exited with non-zero code (Error Code: 1)
This may be caused by attempting to run an image with a misspecified platform or architecture.

Marvin

09/20/2024, 6:31 AM

Based on the documentation and information retrieved, here are some potential causes and steps you can take to debug this issue effectively: ### Potential Causes 1. Misspecified Platform or Architecture: The error may indeed be related to platform or architecture mismatches, particularly if your flow runs are on different hardware setups or cloud instances. This could happen if the container image you're using is not compatible with the execution environment's architecture. 2. Timeouts and Resource Limits: Timeouts or resource constraints could cause the infrastructure to exit unexpectedly. Long-running flows might be hitting some resource limits (CPU, memory, etc.) on the execution environment. 3. Connectivity Issues: Given that the error goes away when restarted, transient network or connectivity issues might also be a cause. These issues can intermittently disrupt the functioning of your containers. ### Debugging Steps 1. Verify Container Image Compatibility: - Ensure that the container image

platform

and

architecture

are compatible with the execution environment. - Check the configuration and parameters used to build or pull the container image. 2. Increase Resource Limits: - If you suspect resource limits are being hit, you could try increasing the CPU, memory, or timeout settings for your flow runs. 3. Consult Cloud Provider Logs: - Check the logs from the cloud provider's control panel for more details on why the container might have exited unexpectedly. 4. Prefect Documentation Review: - For Prefect 2.x: Review relevant pages such as Cloud Run Worker and Azure Container Instances. - For Prefect 3.x: Review troubleshooting guides for specifics about common issues and solutions. ### Additional Links - Prefect 2.x Cloud Run Worker: https://docs-2.prefect.io/latest/integrations/prefect-gcp/cloud_run_worker/ - Prefect 2.x Azure Container Instance: https://docs-2.prefect.io/latest/integrations/prefect-azure/container_instance/ - Prefect 3.x Troubleshoot Cloud: https://docs-3.prefect.io/3.0/manage/cloud/troubleshoot-cloud

🙌 1

Charles Liu

09/20/2024, 6:00 PM

Hey @Javier SG did you figure out the issue? I'm getting similar behaviour for larger flows

Javier SG

09/23/2024, 5:39 AM

Hi @Charles Liu! Not yet... I use an Azure Push pool, and my guess is that the containers are restarted for some reason, I just don't know why (and memory is not a problem in my case). Definitely, not because a "misspecified platform or architecture".

Open in Slack

Previous Next