< Marvin> can you explain what the request memory of a workp Prefect Community #ask-marvin

Join Slack

<@ULVA73B9P> can you explain what the request_memo...

# ask-marvin

Amir Kamal

07/10/2025, 6:51 AM

@Marvin can you explain what the request_memory of a workpool means?

Marvin

07/10/2025, 6:51 AM

✅ thought for 50.9 seconds

Marvin

07/10/2025, 6:52 AM

Ah, I see the confusion. The

request_memory

setting is actually specific to certain infrastructure types in work pools, particularly Kubernetes and Docker. Let me explain how it works: In Prefect 3.x, when you create or configure a work pool that uses container-based infrastructure (like Kubernetes or Docker), you can specify resource requests including memory. This setting tells the infrastructure how much memory should be allocated to each flow run. For example, with a Kubernetes work pool, you might set it like this:

Copy code

python
from prefect.infrastructure import KubernetesWorker

worker = KubernetesWorker(
    resource_requests={"memory": "512Mi"},  # Request 512 MB of memory
    resource_limits={"memory": "1Gi"}       # Limit to 1 GB of memory
)

A few key points about `request_memory`: 1. It's infrastructure-specific - not all work pool types support it 2. The format depends on the infrastructure type (e.g., Kubernetes uses formats like "512Mi" or "2Gi") 3. It's a request, not a guarantee - the actual allocation depends on your infrastructure's capabilities 4. It's often paired with memory limits to set both minimum requested and maximum allowed memory The exact configuration method will depend on how you're creating and managing your work pools, whether through the UI, API, or Python client. For more details about infrastructure configuration, you can check out the infrastructure documentation.

Amir Kamal

07/10/2025, 6:54 AM

@Marvin i have the following definition in my prefect.yaml:

Copy code

definitions:
    work_pools:
        k8_default: &x
            name: x
            work_queue_name: x
            job_variables:
                request_memory: "0.5Gi"
                image_pull_policy: "Always"
                nodeSelector:
                  node-type: static

Can i specify the concurrency and priority here?

Marvin

07/10/2025, 6:54 AM

✅ thought for 23.8 seconds

Marvin

07/10/2025, 6:55 AM

Yes, you can specify both concurrency and priority in your

prefect.yaml

configuration. Here's how you can modify your existing configuration:

Copy code

yaml
definitions:
    work_pools:
        k8_default: &x
            name: x
            work_queue_name: x
            concurrency_limit: 10  # Add this line to set concurrency
            priority: 1            # Add this line to set priority
            job_variables:
                request_memory: "0.5Gi"
                image_pull_policy: "Always"
                nodeSelector:
                  node-type: static

A few important notes about these settings: 1. Concurrency Limit: - Set using

concurrency_limit

- Defines how many flow runs can execute simultaneously in this work queue - If not specified, defaults to unlimited - Must be a positive integer 2. Priority: - Set using

priority

- Lower numbers indicate higher priority (1 is higher priority than 10) - Affects how flow runs are scheduled when multiple work queues are available - Must be a positive integer - Default is 1 These settings can help you control resource utilization and execution order of your flows. For example, if you have multiple work queues, you might want to set different priorities and concurrency limits to ensure critical workflows get resources first and prevent system overload. You can also modify these settings later through the Prefect UI or API if you need to adjust them based on your operational needs.

Amir Kamal

07/10/2025, 6:56 AM

@Marvin can you show me an example of a prefect.yaml with multiple work pools, within which has multiple work queues and all have different concurrencys and priorities

Marvin

07/10/2025, 6:56 AM

✅ thought for 75.5 seconds

Marvin

07/10/2025, 6:58 AM

Here's an example of a

prefect.yaml

that demonstrates multiple work pools with different work queues, each with their own concurrency limits and priorities:

Copy code

yaml
definitions:
  work_pools:
    # Production Kubernetes Work Pool
    prod_kubernetes: &prod_k8s
      name: prod-k8s
      work_queue_name: default  # default queue for this pool
      concurrency_limit: 20     # pool-level concurrency
      job_variables:
        image_pull_policy: "Always"
        request_memory: "1Gi"
        request_cpu: "1"
        nodeSelector:
          env: production

    # Production work queues
    prod_k8s_high_priority: &prod_k8s_high
      <<: *prod_k8s              # inherit from prod_k8s
      work_queue_name: high-pri
      concurrency_limit: 10      # stricter limit for high-priority queue
      priority: 1               # highest priority (1 is highest)
      job_variables:
        request_memory: "2Gi"    # override memory for high-priority tasks
        request_cpu: "2"

    prod_k8s_low_priority: &prod_k8s_low
      <<: *prod_k8s
      work_queue_name: low-pri
      concurrency_limit: 5
      priority: 10              # lower priority
      job_variables:
        request_memory: "512Mi"  # less resources for low-priority tasks
        request_cpu: "0.5"

    # Development Kubernetes Work Pool
    dev_kubernetes: &dev_k8s
      name: dev-k8s
      work_queue_name: default
      concurrency_limit: 10
      job_variables:
        image_pull_policy: "IfNotPresent"
        request_memory: "512Mi"
        request_cpu: "0.5"
        nodeSelector:
          env: development

    # Development work queues
    dev_k8s_testing: &dev_k8s_test
      <<: *dev_k8s
      work_queue_name: testing
      concurrency_limit: 3
      priority: 5
      job_variables:
        request_memory: "1Gi"
        request_cpu: "1"

    # ML Training Work Pool
    ml_training: &ml_pool
      name: ml-training
      work_queue_name: default
      concurrency_limit: 5
      job_variables:
        image_pull_policy: "Always"
        request_memory: "4Gi"
        request_cpu: "2"
        nodeSelector:
          gpu: "true"

    ml_training_large: &ml_large
      <<: *ml_pool
      work_queue_name: large-jobs
      concurrency_limit: 2       # limit resource-intensive jobs
      priority: 3
      job_variables:
        request_memory: "8Gi"
        request_cpu: "4"

# Example deployment configurations using these work pools
deployments:
  - name: critical-etl
    work_pool:
      name: prod-k8s
      work_queue: high-pri      # uses the high-priority queue

  - name: routine-processing
    work_pool:
      name: prod-k8s
      work_queue: low-pri       # uses the low-priority queue

  - name: ml-training-job
    work_pool:
      name: ml-training
      work_queue: large-jobs    # uses the ML large jobs queue

This configuration demonstrates several key concepts: 1. Multiple Work Pools: - Production Kubernetes (

prod-k8s

) - Development Kubernetes (

dev-k8s

) - ML Training (

ml-training

) 2. Different Queue Types: - High-priority queues (priority: 1) - Standard queues (default priority) - Low-priority queues (priority: 10) 3. Varied Concurrency Limits: - Pool-level limits (e.g., 20 for prod) - Queue-specific limits (e.g., 10 for high-priority, 5 for low-priority) 4. Resource Allocation: - Different memory/CPU requests based on queue importance - Specific node selectors for different environments - Special configurations for ML workloads 5. YAML Anchors and References: - Uses

to create anchors - Uses

<<: *

for inheritance to reduce repetition - Allows overriding specific values while inheriting others This structure allows for: - Precise control over resource allocation - Different priorities for different types of work - Separation of concerns between environments - Flexible resource scaling based on workload importance

Marvin

07/10/2025, 6:58 AM

Remember that you can adjust these values based on your specific needs and infrastructure capabilities. The concurrency limits and priorities should be set based on your actual workload patterns and resource availability.

Amir Kamal

07/10/2025, 8:21 AM

@Marvin can you add priority for work pools or is it just for work queues?

Marvin

07/10/2025, 8:21 AM

✅ thought for 34.6 seconds

Marvin

07/10/2025, 8:22 AM

Ah, I need to correct my previous response slightly. Priority can only be set at the work queue level, not at the work pool level. This makes sense architecturally because work queues are what determine the order of execution within a work pool. Here's a corrected example of how priorities should be structured in `prefect.yaml`:

Copy code

yaml
definitions:
  work_pools:
    # Production Kubernetes Work Pool
    prod_kubernetes: &prod_k8s
      name: prod-k8s
      work_queue_name: default  # default queue
      concurrency_limit: 20     # pool-level concurrency
      job_variables:
        image_pull_policy: "Always"
        request_memory: "1Gi"
        nodeSelector:
          env: production

    # Work Queues with different priorities
    prod_k8s_critical: &prod_k8s_critical
      <<: *prod_k8s
      work_queue_name: critical
      priority: 1               # Highest priority queue
      concurrency_limit: 5

    prod_k8s_high: &prod_k8s_high
      <<: *prod_k8s
      work_queue_name: high
      priority: 5               # High priority queue
      concurrency_limit: 10

    prod_k8s_normal: &prod_k8s_normal
      <<: *prod_k8s
      work_queue_name: normal
      priority: 10              # Normal priority queue
      concurrency_limit: 15

    prod_k8s_low: &prod_k8s_low
      <<: *prod_k8s
      work_queue_name: low
      priority: 20              # Lowest priority queue
      concurrency_limit: 20

Key points about priority: 1. Priority is a queue-level setting only 2. Lower numbers indicate higher priority (1 is highest priority) 3. When multiple queues have flow runs ready for execution, the queue with the highest priority (lowest number) will have its flow runs executed first 4. Within a queue, flow runs are typically executed in FIFO (First In, First Out) order This structure allows you to create a hierarchy of work queues within a work pool, where each queue can handle different types of workloads with different urgency levels.

Amir Kamal

07/10/2025, 9:20 AM

@Marvin why would one use multiple work pools when you can just have many work queues with different priorities and concurrencies?

Marvin

07/10/2025, 9:20 AM

✅ thought for 39.3 seconds

Marvin

07/10/2025, 9:21 AM

Ah, this is a great question that gets to the heart of work pool architecture. The key reason to use multiple work pools instead of just multiple work queues comes down to infrastructure and execution environment differences. Here's why: 1. Different Infrastructure Types - Each work pool is tied to specific infrastructure (Kubernetes, Docker, AWS ECS, etc.) - You can't mix infrastructure types within a single work pool - Example: You might need both Kubernetes for heavy processing and Docker for lightweight tasks 2. Resource Isolation - Work pools can run on completely different clusters or cloud environments - Example: A production work pool on a dedicated cluster vs. a development work pool on cheaper instances 3. Security and Access Control - Different work pools can have different security contexts, credentials, and access patterns - Example: A restricted work pool for sensitive financial data vs. a general-purpose work pool 4. Infrastructure Configuration - Work pools have their own base configurations for things like: - Container registries - Network settings - Storage mounts - Base images - These can't be changed at the queue level Here's a practical example:

Copy code

yaml
definitions:
  work_pools:
    # Production Kubernetes Pool - High-Performance Cluster
    prod_kubernetes:
      name: prod-k8s
      job_variables:
        kubernetes_namespace: "prod"
        service_account_name: "prod-service-account"
        node_selector:
          instance-type: "high-memory"
        
    # Development Docker Pool - Local Testing
    dev_docker:
      name: dev-docker
      job_variables:
        network_mode: "host"
        volumes: 
          - "/local/dev/data:/data"

    # GPU Pool - Specialized ML Infrastructure
    ml_gpu_pool:
      name: ml-pool
      job_variables:
        runtime: "nvidia"
        gpu_count: 2
        node_selector:
          accelerator: "nvidia-tesla"

In this example: - The production Kubernetes pool runs on a high-performance cluster with specific security settings - The development Docker pool runs locally with different networking and volume mounts - The ML pool requires GPU access and specialized hardware You couldn't achieve these infrastructure-level differences with just work queues because: 1. Work queues inherit their infrastructure configuration from the parent work pool 2. Work queues can only control priority and concurrency, not infrastructure details 3. Work queues can't switch between different types of execution environments Think of it this way: - Work Pools = "Where and how does this run?" (infrastructure level) - Work Queues = "When and in what order does this run?" (scheduling level) So while you might use multiple work queues to manage priorities within a Kubernetes cluster, you'd need separate work pools if you want to: - Run some flows on AWS ECS and others on Kubernetes - Separate production and development environments - Use different base images or infrastructure configurations - Implement different security contexts or access patterns

Open in Slack

Previous Next