< Marvin> i have setup ecs+ec2 cluster for my workflow pipel Prefect Community #ask-marvin

<@ULVA73B9P> i have setup ecs+ec2 cluster for my w...

Ravish Mallya

07/16/2025, 7:58 AM

@Marvin i have setup ecs+ec2 cluster for my workflow pipeline. if i consider t3.medium instance, one task takes about .25vcpu and 400mb of ram, that gives me 7 job per machine + 1 worker . But when i submit more then required number of jobs the additional jobs crash. i want them them to queue up instead. how to achieve this?

Marvin

07/16/2025, 7:58 AM

✅ thought for 165.2 seconds

Marvin

07/16/2025, 8:01 AM

Based on your scenario, there are a few ways to handle this situation: 1. Configure Concurrency Limits: You can set concurrency limits at multiple levels: a. Work Pool Level: When creating or updating your ECS work pool, you can set a concurrency limit that matches your infrastructure capacity. For a t3.medium instance that can handle 7 jobs + 1 worker, you might want to set the concurrency limit to 7.

Copy code

bash
   prefect work-pool create "my-ecs-pool" \
     --type ecs \
     --concurrency-limit 7

b. Queue Level: You can also create multiple queues within your work pool and set concurrency limits per queue:

Copy code

bash
   prefect work-pool queue create "my-queue" \
     --pool "my-ecs-pool" \
     --concurrency-limit 7

2. ECS Task Definition Settings: Make sure your ECS task definitions accurately reflect your resource requirements: - CPU: 0.25 vCPU = 256 CPU units - Memory: 400MB = 400 MiB 3. Worker Settings: When starting your worker, you can specify how many jobs it should process concurrently:

Copy code

bash
   prefect worker start --pool "my-ecs-pool" --limit 7

The key points to remember: - Jobs that exceed the concurrency limit will automatically queue up and wait for resources to become available - Failed jobs due to resource constraints will be retried according to your retry policy - You can monitor queued runs in the Prefect UI or using the CLI - Consider using auto-scaling policies in your ECS cluster to handle increased load To verify the current status of your work pool and queues:

Copy code

bash
prefect work-pool ls
prefect work-pool queue ls "my-ecs-pool"

Additional Recommendations: 1. Monitor your worker logs to ensure it's not getting overwhelmed 2. Consider implementing flow-level concurrency limits for specific flows that might be resource-intensive 3. Use tags to route different types of jobs to different queues based on their resource requirements Would you like me to provide more specific details about any of these approaches or help you implement a particular solution?

Ravish Mallya

07/16/2025, 8:05 AM

so by default worker picks up everything from the queue? @Marvin

Marvin

07/16/2025, 8:05 AM

✅ thought for 40.5 seconds

Marvin

07/16/2025, 8:06 AM

Yes, by default, a worker will try to process as many jobs as it can from its work pool queue(s), which can lead to resource exhaustion if not properly configured. Here's the important detail: 1. Default Behavior: - Workers will continuously poll for new work - They will attempt to process all available jobs unless limited - There is no built-in resource awareness - the worker doesn't automatically know about your EC2 instance's limitations 2. How to Control This: You can limit the worker in several ways: a. Using the
--limit
flag when starting the worker:

Copy code

bash
   prefect worker start -p "your-pool-name" --limit 7

This ensures the worker won't process more than 7 jobs concurrently. b. Using work pool concurrency limits:

Copy code

bash
   prefect work-pool create "your-pool-name" --type ecs --concurrency-limit 7

This limits the entire work pool to 7 concurrent jobs. c. Using queue-specific limits:

Copy code

bash
   prefect work-pool queue create "your-queue" \
     --pool "your-pool-name" \
     --concurrency-limit 7

3. Best Practice: For your t3.medium scenario, you should: - Set a worker limit to 7 jobs (matching your calculated capacity) - Set a work pool concurrency limit as a backup safety measure - Monitor the worker's resource usage to ensure the limits are appropriate Without these limits, yes, the worker will keep accepting new jobs as they come in, which explains why you're seeing crashes when submitting more jobs than the instance can handle. Would you like me to explain how to implement any of these limiting approaches in more detail?

Open in Slack

Previous Next