https://prefect.io logo
Title
a

Ankit

04/14/2023, 8:54 AM
@Marvin can you tell me why my prefect flow is crashing with this error on kubernetes aws
Flow run infrastructure exited with non-zero status code -1.
m

Marvin

04/14/2023, 8:55 AM
I found some information that might help you understand the error you're encountering. The error
Flow run infrastructure exited with non-zero status code -1
might be related to the cancellation of a flow run or issues with the infrastructure. Here are some steps you can take to troubleshoot the issue: 1. Verify that the agent process is running, e.g., Kubernetes deployment, ECS service, dockerd daemon. 2. Check the agent logs to see if anything suspicious stands out there. 3. Verify that your execution layer is able to pull your flow run's image. If the image needs to be pulled from a container registry, make sure your container can reach the internet and has appropriate permissions to pull the image. 4. Verify that your execution layer has enough permissions. You can find more information in the following resources: - Agent source code - Discourse discussion on flow run stuck in a pending state I hope this helps you resolve the issue. If you need more assistance, please let me know.
n

Nate

04/14/2023, 2:32 PM
@Marvin I checked all of these, still not working. it might be important context that I am mapping about 100 tasks
m

Marvin

04/14/2023, 2:33 PM
I see, mapping a large number of tasks could potentially cause issues. Here are some additional suggestions to help you troubleshoot the problem: 1. Check the resource limits for your Kubernetes cluster, such as CPU and memory. Make sure your cluster has enough resources to handle the number of tasks you're mapping. You might need to increase the resources allocated to your cluster or adjust the resource requests and limits for your tasks. 2. Monitor the Kubernetes pod logs for any errors or warnings. This can provide more insight into the issue and help you identify the root cause. 3. Consider breaking down your flow into smaller, more manageable chunks. You can use Dask to parallelize your tasks and distribute the workload across multiple workers. This can help improve the performance and stability of your flow. You can find more information on using Dask with Prefect in the official documentation. 4. If you're using Prefect Cloud, you can also check the flow run logs in the Prefect Cloud UI for any additional information or error messages. If you're still having trouble, please provide more details about your flow, such as the tasks you're mapping and any specific error messages you're encountering. This will help me better understand the issue and provide more targeted assistance.
n

Nate

04/14/2023, 2:34 PM
here's the real dask link