< Marvin> In a flow where I use DaskTaskRunner can I switch Prefect Community #ask-marvin

Join Slack

<@ULVA73B9P> In a flow where I use DaskTaskRunner ...

# ask-marvin

Viktor

07/15/2025, 6:56 PM

@Marvin In a flow where I use DaskTaskRunner can I switch some tasks to use normal runner instead?

Marvin

07/15/2025, 6:56 PM

✅ thought for 30.2 seconds

Marvin

07/15/2025, 6:56 PM

No, in Prefect you cannot switch individual tasks to use different task runners within the same flow. The task runner is specified at the flow level and applies to all tasks within that flow. When you configure a flow with

DaskTaskRunner

, it will handle the execution of all tasks in that flow. Here's a basic example to illustrate:

Copy code

python
from prefect import flow, task
from prefect_dask import DaskTaskRunner

@task
def task1():
    return 1

@task
def task2():
    return 2

@flow(task_runner=DaskTaskRunner())
def my_flow():
    # Both tasks will use DaskTaskRunner
    result1 = task1()
    result2 = task2()

If you need different execution patterns for different tasks, you have a few alternatives: 1. Create separate flows with different task runners and call them using subflows 2. Use the task runner that meets the needs of your most demanding tasks 3. Implement custom logic within your tasks to handle specific execution requirements The task runner you choose will be responsible for executing all tasks within its flow context. You can find more information about task runners in the Prefect documentation.

Viktor

07/15/2025, 6:59 PM

@Marvin Is this a correct syntax? @flow( name=“generate_creatordb”, task_runner=DaskTaskRunner( cluster_kwargs={ “processes”: True, “threads_per_worker”: 4, “memory_limit”: “100GB”, # High limit per worker “adaptive”: { “minimum”: 1, # Can scale down to 1 worker “maximum”: 3, # Max 3 workers } } ), )

Marvin

07/15/2025, 6:59 PM

✅ thought for 31.7 seconds

Marvin

07/15/2025, 7:00 PM

Based on the research, your syntax needs some adjustments. Here's the correct way to configure DaskTaskRunner in Prefect 3.x:

Copy code

python
from prefect import flow
from prefect_dask import DaskTaskRunner

@flow(
    name="generate_creatordb",
    task_runner=DaskTaskRunner(
        cluster_kwargs={
            "n_workers": 3,  # Instead of using adaptive directly
            "threads_per_worker": 4,
            "memory_limit": "100GB",
        },
        adapt_kwargs={  # Adaptive scaling should be in adapt_kwargs, not cluster_kwargs
            "minimum": 1,
            "maximum": 3,
        }
    ),
)

The key differences from your original syntax are: 1. Adaptive scaling configuration should be in

adapt_kwargs

instead of nested inside

cluster_kwargs

2. The

processes

parameter isn't typically needed as it's a default setting 3. Worker count can be set directly with

n_workers

cluster_kwargs

This configuration will: - Allow up to 4 threads per worker - Set a 100GB memory limit per worker - Enable adaptive scaling between 1 and 3 workers - Maintain the same basic functionality you were aiming for If you need to use a specific Dask cluster class or have additional client configuration needs, you can also specify those using

cluster_class

and

client_kwargs

parameters. For more complex Dask configurations, you might want to consider creating a DaskTaskRunner configuration as a block, which allows for more detailed configuration options and reusability.

Viktor

07/15/2025, 7:01 PM

@Marvin what will happen if my server has 128gb ram and all 3 workers are required?

Marvin

07/15/2025, 7:01 PM

✅ thought for 10.0 seconds

Marvin

07/15/2025, 7:01 PM

Ah, this is a good question about resource management! Let's do the math: With your configuration: - Each worker requests 100GB of RAM - Maximum of 3 workers - Total RAM requested: 3 × 100GB = 300GB - Available RAM: 128GB This configuration would cause problems because: 1. You're oversubscribing your memory (trying to use 300GB when only 128GB is available) 2. This could lead to: - Workers failing to start - System swapping to disk (severe performance degradation) - Potential system instability - Out of Memory (OOM) errors Here's a more appropriate configuration for a 128GB server:

Copy code

python
from prefect import flow
from prefect_dask import DaskTaskRunner

@flow(
    name="generate_creatordb",
    task_runner=DaskTaskRunner(
        cluster_kwargs={
            "threads_per_worker": 4,
            "memory_limit": "35GB",  # Adjusted down
        },
        adapt_kwargs={
            "minimum": 1,
            "maximum": 3,
        }
    ),
)

Why these numbers? - Setting

memory_limit="35GB"

per worker means: - 3 workers × 35GB = 105GB total maximum usage - Leaves ~23GB for the operating system, Dask scheduler, and other processes - Provides a safety margin to prevent OOM situations Some tips: 1. Always leave some memory for: - Operating system - Dask scheduler - Other system processes - Memory spikes 2. A good rule of thumb is to allocate about 75-80% of total available RAM to your workers 3. Consider setting memory limits based on your actual data processing needs rather than maximum available If you need to process larger datasets, consider: 1. Processing data in chunks 2. Using disk-based processing methods 3. Reducing the number of concurrent workers 4. Using a larger server or distributed cluster

2 Views

Open in Slack

Previous Next