Naimesh Chaudhari

04/12/2022, 3:56 PM
What is the best method to split task resources up in a Flow. For example if I want a task to connect to a large spark cluster to do preprocessing, while for my ML task I want it to get connected to a GPU instance.

Kevin Kho

04/12/2022, 3:58 PM
Subflows is one because the executor is on a subflow level. Resource manager is also another option. Dask annotations is another possibility. From the DaskExecutor docs:
Note that if you have tasks with tags of the form "dask-resource:KEY=NUM" they will be parsed and passed as Worker Resources of the form {"KEY": float(NUM)} to the Dask Scheduler.
👍 1

Naimesh Chaudhari

04/12/2022, 4:00 PM
TY ill take a look 😄

Anna Geller

04/12/2022, 4:46 PM
also, if you want to run each task in a different container/Kubernetes pod to handle resource allocation this way, check this Discourse topic