https://prefect.io logo
Title
j

Jake Schmidt

01/20/2020, 1:40 PM
Hello! Wondering if thereโ€™s a timeframe for task affinity โ€” would love to be able to specify my model training task on a k8s resource with GPU.
j

Joe Schmid

01/20/2020, 2:43 PM
Hi @Jake Schmidt, we already take advantage of this and it works great! Docs are here: https://docs.prefect.io/api/unreleased/engine/executors.html#daskexecutor and some code snippets from how we use it:
@task(tags=["dask-resource:GPU=1"])
def task_that_uses_gpu():
and then the relevant YAML section for our k8s GPU workers:
containers:
        - args:
            - dask-worker
            - dask-scheduler:8786
            - --resources
            - "GPU=1"
๐Ÿ‘๐Ÿผ 1
We also use this same approach for what we call "High Memory Workers." We have certain parts of our data science pipeline that need a large amount of RAM on Dask workers, e.g. some code that manipulates a large amount of data in a pandas dataframe, etc. (We're migrating to Dask dataframes to avoid this, but some of our legacy code isn't converted yet.)
๐Ÿ‘๐Ÿผ 1
j

Jackson Maxfield Brown

01/20/2020, 4:33 PM
@Joe Schmid this is intriguing, sorry for my lack of knowledge where is that YAML placed? / More info on that file?
j

Joe Schmid

01/20/2020, 4:40 PM
Hi @Jackson Maxfield Brown, I should have explained more -- in this scenario, we are creating our own long-running Dask cluster using dask-kubernetes and running in AWS. The YAML snippet that I showed is from a Kubernetes "Deployment" specification for Dask workers running on machines with a GPU and the snippet shows starting the Dask workers with a parameter called
resources
and passing that parameter the value
GPU=1
Prefect can then use task tagging (the other snippet I showed) to route tasks only to Dask workers that have appropriate resources. It's really powerful and has been very successful for us.
j

Jackson Maxfield Brown

01/20/2020, 4:43 PM
Ahhhhh I see. Makes sense though and this is useful info. We have an internal SLURM cluster that splits our CPU and GPU nodes so to get this to work I think we would have to set up a dask scheduler on the head node which is probably not ideal but a Dask Kubernetes configuration like that would be great.
j

Joe Schmid

01/20/2020, 4:52 PM
@Jackson Maxfield Brown Yeah, the Dask resources / task affinity in Prefect is really cool. I haven't done it on SLURM (though I have a friend who might be able to get me access to a supercomputer to try this out... ๐Ÿ™‚) but it looks like from these Dask docs it should be doable: https://jobqueue.dask.org/en/latest/examples.html#slurm-deployment-providing-additional-arguments-to-the-dask-workers (See the bottom example)
j

Jackson Maxfield Brown

01/20/2020, 4:59 PM
Hmmm I think you're right that this is possible. We would just be passing the "queue" to spawn the worker with. Huh. Will have to give this a try tomorrow.
๐Ÿš€ 1
Really nice find!