https://prefect.io logo
Title
j

James Brady

08/14/2022, 1:40 PM
I'm using DaskTaskRunner, karpenter, and the
dask_kubernetes.KubeCluster
class ā€“ trying to get a flow to run on a GPU-enabled node. What's happening: ā€¢ The pod_template I'm using for the flow specifies
<http://nvidia.com/gpu|nvidia.com/gpu>: 1
, per the docs ā€¢ karpenter starts a new node, which has a GPU (šŸ™Œ) ā€¢ However, the new node can't accommodate the dask client because it doesn't have the right resource annotation ("0/3 nodes are available: ā€¦ , 3 Insufficient nvidia.com/gpu") I realise this might be a question better-suited to the dask community, but would appreciate any stories of people successfully running Prefect 2 workloads on GPU-accelerated nodes and/or help figuring out the above specific issue.
a

Anna Geller

08/14/2022, 2:45 PM
I agree that posting this in dask Discourse will help you more. Prefect only submits tasks to a Dask cluster but the rest is left to Dask. To troubleshoot, perhaps you can start the process by just running things on this Kubernetes cluster with GPU node but without Dask - this way you can troubleshoot more incrementally and check e.g. if the right CUDA drivers are installed etc.
j

James Brady

08/15/2022, 3:56 PM
Following up here: The problem was that I hadn't installed the NVIDIA device plugin daemonset in my cluster, so although the right nodes were being spun up, it didn't look like they had GPUs to the provisioner
šŸ’Æ 1
:thank-you: 1