Amir Kamal
07/10/2025, 6:51 AMMarvin
07/10/2025, 6:51 AMMarvin
07/10/2025, 6:52 AMrequest_memory
setting is actually specific to certain infrastructure types in work pools, particularly Kubernetes and Docker. Let me explain how it works:
In Prefect 3.x, when you create or configure a work pool that uses container-based infrastructure (like Kubernetes or Docker), you can specify resource requests including memory. This setting tells the infrastructure how much memory should be allocated to each flow run.
For example, with a Kubernetes work pool, you might set it like this:
python
from prefect.infrastructure import KubernetesWorker
worker = KubernetesWorker(
resource_requests={"memory": "512Mi"}, # Request 512 MB of memory
resource_limits={"memory": "1Gi"} # Limit to 1 GB of memory
)
A few key points about `request_memory`:
1. It's infrastructure-specific - not all work pool types support it
2. The format depends on the infrastructure type (e.g., Kubernetes uses formats like "512Mi" or "2Gi")
3. It's a request, not a guarantee - the actual allocation depends on your infrastructure's capabilities
4. It's often paired with memory limits to set both minimum requested and maximum allowed memory
The exact configuration method will depend on how you're creating and managing your work pools, whether through the UI, API, or Python client.
For more details about infrastructure configuration, you can check out the infrastructure documentation.Amir Kamal
07/10/2025, 6:54 AMdefinitions:
work_pools:
k8_default: &x
name: x
work_queue_name: x
job_variables:
request_memory: "0.5Gi"
image_pull_policy: "Always"
nodeSelector:
node-type: static
Can i specify the concurrency and priority here?Marvin
07/10/2025, 6:54 AMMarvin
07/10/2025, 6:55 AMprefect.yaml
configuration. Here's how you can modify your existing configuration:
yaml
definitions:
work_pools:
k8_default: &x
name: x
work_queue_name: x
concurrency_limit: 10 # Add this line to set concurrency
priority: 1 # Add this line to set priority
job_variables:
request_memory: "0.5Gi"
image_pull_policy: "Always"
nodeSelector:
node-type: static
A few important notes about these settings:
1. Concurrency Limit:
- Set using concurrency_limit
- Defines how many flow runs can execute simultaneously in this work queue
- If not specified, defaults to unlimited
- Must be a positive integer
2. Priority:
- Set using priority
- Lower numbers indicate higher priority (1 is higher priority than 10)
- Affects how flow runs are scheduled when multiple work queues are available
- Must be a positive integer
- Default is 1
These settings can help you control resource utilization and execution order of your flows. For example, if you have multiple work queues, you might want to set different priorities and concurrency limits to ensure critical workflows get resources first and prevent system overload.
You can also modify these settings later through the Prefect UI or API if you need to adjust them based on your operational needs.Amir Kamal
07/10/2025, 6:56 AMMarvin
07/10/2025, 6:56 AMMarvin
07/10/2025, 6:58 AMprefect.yaml
that demonstrates multiple work pools with different work queues, each with their own concurrency limits and priorities:
yaml
definitions:
work_pools:
# Production Kubernetes Work Pool
prod_kubernetes: &prod_k8s
name: prod-k8s
work_queue_name: default # default queue for this pool
concurrency_limit: 20 # pool-level concurrency
job_variables:
image_pull_policy: "Always"
request_memory: "1Gi"
request_cpu: "1"
nodeSelector:
env: production
# Production work queues
prod_k8s_high_priority: &prod_k8s_high
<<: *prod_k8s # inherit from prod_k8s
work_queue_name: high-pri
concurrency_limit: 10 # stricter limit for high-priority queue
priority: 1 # highest priority (1 is highest)
job_variables:
request_memory: "2Gi" # override memory for high-priority tasks
request_cpu: "2"
prod_k8s_low_priority: &prod_k8s_low
<<: *prod_k8s
work_queue_name: low-pri
concurrency_limit: 5
priority: 10 # lower priority
job_variables:
request_memory: "512Mi" # less resources for low-priority tasks
request_cpu: "0.5"
# Development Kubernetes Work Pool
dev_kubernetes: &dev_k8s
name: dev-k8s
work_queue_name: default
concurrency_limit: 10
job_variables:
image_pull_policy: "IfNotPresent"
request_memory: "512Mi"
request_cpu: "0.5"
nodeSelector:
env: development
# Development work queues
dev_k8s_testing: &dev_k8s_test
<<: *dev_k8s
work_queue_name: testing
concurrency_limit: 3
priority: 5
job_variables:
request_memory: "1Gi"
request_cpu: "1"
# ML Training Work Pool
ml_training: &ml_pool
name: ml-training
work_queue_name: default
concurrency_limit: 5
job_variables:
image_pull_policy: "Always"
request_memory: "4Gi"
request_cpu: "2"
nodeSelector:
gpu: "true"
ml_training_large: &ml_large
<<: *ml_pool
work_queue_name: large-jobs
concurrency_limit: 2 # limit resource-intensive jobs
priority: 3
job_variables:
request_memory: "8Gi"
request_cpu: "4"
# Example deployment configurations using these work pools
deployments:
- name: critical-etl
work_pool:
name: prod-k8s
work_queue: high-pri # uses the high-priority queue
- name: routine-processing
work_pool:
name: prod-k8s
work_queue: low-pri # uses the low-priority queue
- name: ml-training-job
work_pool:
name: ml-training
work_queue: large-jobs # uses the ML large jobs queue
This configuration demonstrates several key concepts:
1. Multiple Work Pools:
- Production Kubernetes (prod-k8s
)
- Development Kubernetes (dev-k8s
)
- ML Training (ml-training
)
2. Different Queue Types:
- High-priority queues (priority: 1)
- Standard queues (default priority)
- Low-priority queues (priority: 10)
3. Varied Concurrency Limits:
- Pool-level limits (e.g., 20 for prod)
- Queue-specific limits (e.g., 10 for high-priority, 5 for low-priority)
4. Resource Allocation:
- Different memory/CPU requests based on queue importance
- Specific node selectors for different environments
- Special configurations for ML workloads
5. YAML Anchors and References:
- Uses &
to create anchors
- Uses <<: *
for inheritance to reduce repetition
- Allows overriding specific values while inheriting others
This structure allows for:
- Precise control over resource allocation
- Different priorities for different types of work
- Separation of concerns between environments
- Flexible resource scaling based on workload importanceMarvin
07/10/2025, 6:58 AMAmir Kamal
07/10/2025, 8:21 AMMarvin
07/10/2025, 8:21 AMMarvin
07/10/2025, 8:22 AMyaml
definitions:
work_pools:
# Production Kubernetes Work Pool
prod_kubernetes: &prod_k8s
name: prod-k8s
work_queue_name: default # default queue
concurrency_limit: 20 # pool-level concurrency
job_variables:
image_pull_policy: "Always"
request_memory: "1Gi"
nodeSelector:
env: production
# Work Queues with different priorities
prod_k8s_critical: &prod_k8s_critical
<<: *prod_k8s
work_queue_name: critical
priority: 1 # Highest priority queue
concurrency_limit: 5
prod_k8s_high: &prod_k8s_high
<<: *prod_k8s
work_queue_name: high
priority: 5 # High priority queue
concurrency_limit: 10
prod_k8s_normal: &prod_k8s_normal
<<: *prod_k8s
work_queue_name: normal
priority: 10 # Normal priority queue
concurrency_limit: 15
prod_k8s_low: &prod_k8s_low
<<: *prod_k8s
work_queue_name: low
priority: 20 # Lowest priority queue
concurrency_limit: 20
Key points about priority:
1. Priority is a queue-level setting only
2. Lower numbers indicate higher priority (1 is highest priority)
3. When multiple queues have flow runs ready for execution, the queue with the highest priority (lowest number) will have its flow runs executed first
4. Within a queue, flow runs are typically executed in FIFO (First In, First Out) order
This structure allows you to create a hierarchy of work queues within a work pool, where each queue can handle different types of workloads with different urgency levels.Amir Kamal
07/10/2025, 9:20 AMMarvin
07/10/2025, 9:20 AMMarvin
07/10/2025, 9:21 AMyaml
definitions:
work_pools:
# Production Kubernetes Pool - High-Performance Cluster
prod_kubernetes:
name: prod-k8s
job_variables:
kubernetes_namespace: "prod"
service_account_name: "prod-service-account"
node_selector:
instance-type: "high-memory"
# Development Docker Pool - Local Testing
dev_docker:
name: dev-docker
job_variables:
network_mode: "host"
volumes:
- "/local/dev/data:/data"
# GPU Pool - Specialized ML Infrastructure
ml_gpu_pool:
name: ml-pool
job_variables:
runtime: "nvidia"
gpu_count: 2
node_selector:
accelerator: "nvidia-tesla"
In this example:
- The production Kubernetes pool runs on a high-performance cluster with specific security settings
- The development Docker pool runs locally with different networking and volume mounts
- The ML pool requires GPU access and specialized hardware
You couldn't achieve these infrastructure-level differences with just work queues because:
1. Work queues inherit their infrastructure configuration from the parent work pool
2. Work queues can only control priority and concurrency, not infrastructure details
3. Work queues can't switch between different types of execution environments
Think of it this way:
- Work Pools = "Where and how does this run?" (infrastructure level)
- Work Queues = "When and in what order does this run?" (scheduling level)
So while you might use multiple work queues to manage priorities within a Kubernetes cluster, you'd need separate work pools if you want to:
- Run some flows on AWS ECS and others on Kubernetes
- Separate production and development environments
- Use different base images or infrastructure configurations
- Implement different security contexts or access patterns