j

    Justin Green

    8 months ago
    Hello, I have a situation where I am generating tasks at run time based on some configuration (so the number of tasks may be different depending on config). I would also like these tasks to be able to be run in parallel using the local dask executor. The difficulty I'm running into is that I need the tasks to be defined individually in the flow in order for the local dask executor to be able to run them in parallel. I do not know how many tasks there will be until runtime, so I cannot define the flow with individual tasks. I cannot simply loop through the task functions in the flow because the executor will not execute the tasks in parallel. Is there a recommended way to accomplish this?
    Anna Geller

    Anna Geller

    8 months ago
    Yes, the solution for this is Mapping. You can use mapping to generate child tasks at runtime based on some dynamic state of the world. You can have one task in your flow that determines this dynamic state e.g. this task may return a list of files to be processed. You can then use mapping to process those tasks in parallel. The actual task (i.e. function) that processes it could additionally perform some different action based on the value of the input it receives, e.g. if the file is a CSV then it may do X, and if this file is parquet then do Y. And as long as you assign a DaskExecutor or LocalDaskExecutor to your flow, it will all run in parallel.
    j

    Justin Green

    8 months ago
    Thank you for the reply. I will read up on mapping and try it out!