Hey everyone. I'm working on setting up a Docker a...
# ask-community
j
Hey everyone. I'm working on setting up a Docker agent on an on-prem server so that we can run some flows which require GPU access. Unfortunately, I'm having trouble working out how I can make the GPUs available to the image at startup. The docker version we're using supports passing them through with a
--gpus
flag. But setting that flag doesn't seem to be supported. Has anyone run into and implemented anything similar?
k
Hey @Jeff Baatz, yeah this the second time I’ve seen this. We don’t know immediately how to do this because it’s not a matter of installing an image with CUDA to detect it. We’d love to learn more from the community on this one.
j
It would be simple enough to make it work if I can pass additional args to docker
k
Will raise this to the team
j
At least for docker >= 19.03, this has been supported without using a special version of docker.
Thanks!
For what it's worth - you do need to use an nvidia toolkit for it to work smoothly. https://stackoverflow.com/questions/25185405/using-gpu-from-a-docker-container The answer by Rohit goes into some detail with documentation. So it's not quite as simple as just using docker as is, but it's also not a complicated setup.
k
Got it! Will explore opening the flag and leaving the toolkit installation to the user
j
Great! I know it could be a challenge with how inflexible dockerpy can be sometime.
I'll continue to look for alternatives and update if I run into anything that others could use!
k
Yeah this flag doesn’t seem exposed with dockerpy per this issue
j
I misinterpreted the nividia runtime docs. It turns out that if you set the nvidia runtime as the default in
/etc/docker/daemon.json
. The docs for the nvidia runtime somewhat indicate that you still need to use the arguments along with
docker run
if you do that, but it turns out to not be the case. You can simply set the env variable NVIDIA_VISIBLE_DEVICES in the environment that's running the docker agent. It might not be an acceptable solution for all since it modifies the default docker behavior, but works for our use case.
k
Wow that sounds really easy! That’s good to know. Do you have reading material on that?
j
@Kevin Kho I do! https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#daemon-configuration-file This specifically is what we ended up doing. This user guide includes installation info as well further above. The docker agent launch then does the following in supervisord:
Copy code
[program:prefect-agent-gpu2]
command=prefect agent docker start -l {OUR_LABELS} -f -n {OUR_NAME} -t ${PREFECT_TOKEN} -e CUDA_VISIBLE_DEVICES="0,1" -e NVIDIA_VISIBLE_DEVICES="6,7"
user=supervisor_user
environment=HOME="/home/supervisor_user",USER="supervisor_user",NVIDIA_VISIBLE_DEVICES="6,7"
For us, we're interested in "reserving" two GPUs on an on-prem server for flows. The tricky part here is that this one docker agent will keep spinning up multiple processes without really realizing that the resources aren't sharable. The plan for now is to just give the runner a distinct label that has a flow concurrency cap of 1.
For that supervisor program excerpt - it's an excerpt from a template file that we use to keep our supervisor configuration version controlled. Nothing too weird. The main part of that is that NVIDIA_VISIBLE_DEVICES needs to be set in the docker agent environment (and I also set it in the program environment, but only out of paranoia). CUDA_VISIBLE_DEVICES will then start out indexed at 0. There are a few layers of indirection going on there that is a little confusing at first.
k
@Marvin archive “Using a GPU with DockerRun”