hello, prefect community. Does prefect has an idio...
# prefect-community
d
hello, prefect community. Does prefect has an idiomatic way to specify a) “please run this Task within this docker image” and b) “please execute this Task on a container using x amounts of cpu/memory + gpu”. I am curious about task-level specification, not flow-level.
j
Hey @dh there isn’t specifically a per-task way of controlling each task’s environment (but it is something on our radar). I have seen flows that do something like this using things like the Docker tasks (to run processes in other containers, or the k8s job tasks if using k8s) or the StartFlowRun task (with a flow that has a specific run config for the container specs)
j
@dh Are you using k8s? Maybe creating a k8s job (as a Prefect task) would work for you: https://docs.prefect.io/api/latest/tasks/kubernetes.html#runnamespacedjob Also, do you know the rough resource requirements for the flow ahead of time and then need to map tasks onto those resources or do you need truly dynamic task resources unknowable ahead of time?
d
@josh @Joe Schmid thanks a lot for sharing the advice. No. We’re not using k8s but SLURM as a resource manager and job scheduler. context: I (the company I am working for) am trying to integrate Prefect and SLURM (HPC cluster) via writing a custom SLURMExecutor class. The use case is machine learning and design inspiration about per-task containerization largely comes from Kubeflow. (so essentially letting k8s handler the task-level containerization as you suggested, but we don’t have k8s…) use case: task A -> needs only cpu. needs X, Y, Z python dependencies. task B -> needs X number of GPUS. needs X, Y, Z python dependencies. I am aware one path forward is to define an environment that’s superset of all those but due to various reasons, we think it’s not quite scalable.
@josh @Joe Schmid Hi Josh and Joe, do you happen to know any examples/pointers to share where someone wrote a custom Executor class other than the ones in the code base? Since there’s only one sophisticated example (DaskExecutor), I am hoping to see other examples for design inspiration.
j
@dh Are you already using Dask-Jobqueue & SLURMCluster for testing? https://jobqueue.dask.org/en/latest/examples.html
My main thought is that it'd be good to exhaust all possibilities of using existing Dask projects before heading down the path of writing a custom executor. (But you may have already exhausted those possibilities.)
d
yeah, so we explored a bit with Dask but it seems it’s current a long shot under Dask to do “multi-node” CudaCluster behind SLURM (we have a special ML system called NVIDIA DGX). We also had a chat with an NVIDIA team and it seems they could not get the distributed execution over a multi-node GPU cluster working under Dask.
thank you so much for sharing the advice.
j
@dh Hmmm, something seems odd about that. NVIDIA has been investing hugely in Dask (via RAPIDS, etc.) including with GPUs. I don't mean to second guess what you heard from them, but I feel like (if you haven't already) it might be worth talking to the RAPIDS folks (Josh Patterson, Keith Kraus, etc.) at NVIDIA. Things like this seem to indicate Dask is supported on the DGX: https://medium.com/rapids-ai/10-minutes-to-rapids-cudf-and-dask-cudf-3d16fcb84139
d
@Joe Schmid yeah, it’s odd indeed and I read the article and read another from Matt Rocklin himself. We shall try to dig deeper but we have been unable to run pytorch distributed code behind Dask in a multi-node setup. Very few people seem to have tried that but we could not find a good solution yet.
j
@dh Gotcha. I guess I’m in the camp that pursuing getting Dask running in multi-node mode on the DGX (ideally in conjunction with NVIDIA) and then running Prefect on Dask is a better option than trying to write a Prefect custom executor. NVIDIA should have a lot of incentive to get that working so I suspect if you reach the right folks (esp RAPIDSAI team) they’d be supportive. You’ve probably seen these: https://medium.com/rapids-ai/high-performance-python-communication-with-ucx-py-221ac9623a6a https://blog.dask.org/2019/06/09/ucx-dgx
@dh You might try chatting with the Dask folks on their Gitter channels. They’re a friendly and helpful crew. They should be able to at least steer you in the right direction. https://gitter.im/dask/dask
d
@Joe Schmid thank you for the advice. I will try reach out to the dask folks. The blogpost is not year old and there might have been more interesting updates. As an aside, we started already using Prefect for our ETL jobs and so impressed, we are investigating ways to use Prefect for ML workflow orchestration. There’s very few options for doing this in a non-kubernetes env. If we can make “distributed ML application pipeline working in a high performance cluster” work, using Prefect / Dask, that would be super lovely.
j
@dh The good news is that there are a lot of folks using Dask in HPC environments with SLURM plus the articles at those links indicate that Dask was running on a DGX at least in some form, so it definitely feels like you should be close to having a viable option. (Maybe just need to address the multinode aspects, but that UCX work seemed promising.) Best of luck and keep us posted!
💯 1
d
thank you! yeah, if and when I get a major update, I will share!