<@ULVA73B9P> Hey I'm running a few subflows that c...
# marvin-in-the-wild
c
@Marvin Hey I'm running a few subflows that call several tasks with azure work pool in prefect v3. My yaml deployment file defines the image platform as
linux/amd64
. For the most part, each of my subflows are working with 10-30 subtasks. When the subtasks increase to ~500, my container crashes with the error message
Failed due to a(n) Container exited with non-zero code (Error Code: 137) likely caused by misspecified platform or architecture
I've tried to increase my workpool resources but it still seems to fail:
Copy code
work_pool:
    name: endex-aci-pool
    job_variables:
      image: '{{ build_image.image }}'
      container:
        resources:
          requests:
            memory: "12Gi" 
            cpu: "4"
m
The error code 137 typically indicates that the container was terminated due to an out-of-memory (OOM) condition. Even though you've increased your memory allocation to 12Gi, large numbers of tasks can still cause your container to run out of memory, especially if tasks are memory-intensive. Here's what you can do: 1. Profile Your Memory Usage: Try to profile memory usage to identify which tasks consume the most memory. 2. Optimize Task and Flow Design: Evaluate if the tasks or subflows can be optimized to use less memory. 3. Parallelize or Break Down Workflows: Consider breaking down workflows into smaller subflows or tasks to distribute the load more effectively. 4. Log and Monitor: Make use of logging and monitoring to track memory usage dynamically. For more information, you might find these resources helpful: - Troubleshooting guide: Prefect Troubleshoot Cloud - Azure ACI worker setup and configuration: Prefect Azure ACI Worker Would you like to know how to profile your memory usage in tasks or explore more about breaking down workflows or optimizing tasks in Prefect?