Hello everyone, We found that DaskTaskRunner is not necessarily needed anymore. We were wondering if...
l
Hello everyone, We found that DaskTaskRunner is not necessarily needed anymore. We were wondering if we remove it, specifically for a Flow, how would it work? i.e. It will check the complexity and define the optimum threads? Here we refer to a common use:
Copy code
...
@flow(
    name="Scrapers sub-flow",
    task_runner=DaskTaskRunner(
        cluster_kwargs={"n_workers": 1, "processes": False, "threads_per_worker": 20}
    ),
)
def scraper(splitted_files: list[str], run_timestamp: str, success_threshold: float):
    logger = get_run_logger() 
...
n
hi @Lina Carmona - the default task runner is the
ThreadPoolTaskRunner
which will run each submitted task run in a single thread (per
ThreadPoolExecutor
from standard library python). so if you were using
DaskTaskRunner
and things were working, all you should have to do to move to the default thread pool task runner is remove the
task_runner
keyword argument from
@flow
Copy code
...
@flow(
    name="Scrapers sub-flow"
)
def scraper(splitted_files: list[str], run_timestamp: str, success_threshold: float):
    logger = get_run_logger() 
...
gratitude thank you 1