Hey everyone I m working on setting up a Docker agent on an Prefect Community #ask-community

Hey everyone. I'm working on setting up a Docker a...

Jeff Baatz

06/30/2021, 8:34 PM

Hey everyone. I'm working on setting up a Docker agent on an on-prem server so that we can run some flows which require GPU access. Unfortunately, I'm having trouble working out how I can make the GPUs available to the image at startup. The docker version we're using supports passing them through with a

--gpus

flag. But setting that flag doesn't seem to be supported. Has anyone run into and implemented anything similar?

Kevin Kho

06/30/2021, 8:36 PM

Hey @Jeff Baatz, yeah this the second time I’ve seen this. We don’t know immediately how to do this because it’s not a matter of installing an image with CUDA to detect it. We’d love to learn more from the community on this one.

Jeff Baatz

06/30/2021, 8:39 PM

It would be simple enough to make it work if I can pass additional args to docker

Kevin Kho

06/30/2021, 8:41 PM

Will raise this to the team

Jeff Baatz

06/30/2021, 8:41 PM

At least for docker >= 19.03, this has been supported without using a special version of docker.

Jeff Baatz

06/30/2021, 8:41 PM

Thanks!

Jeff Baatz

06/30/2021, 8:43 PM

For what it's worth - you do need to use an nvidia toolkit for it to work smoothly. https://stackoverflow.com/questions/25185405/using-gpu-from-a-docker-container The answer by Rohit goes into some detail with documentation. So it's not quite as simple as just using docker as is, but it's also not a complicated setup.

Kevin Kho

06/30/2021, 8:46 PM

Got it! Will explore opening the flag and leaving the toolkit installation to the user

Jeff Baatz

06/30/2021, 8:49 PM

Great! I know it could be a challenge with how inflexible dockerpy can be sometime.

Jeff Baatz

06/30/2021, 8:50 PM

I'll continue to look for alternatives and update if I run into anything that others could use!

Kevin Kho

06/30/2021, 8:53 PM

Yeah this flag doesn’t seem exposed with dockerpy per this issue

Jeff Baatz

06/30/2021, 9:29 PM

I misinterpreted the nividia runtime docs. It turns out that if you set the nvidia runtime as the default in

/etc/docker/daemon.json

. The docs for the nvidia runtime somewhat indicate that you still need to use the arguments along with

docker run

if you do that, but it turns out to not be the case. You can simply set the env variable NVIDIA_VISIBLE_DEVICES in the environment that's running the docker agent. It might not be an acceptable solution for all since it modifies the default docker behavior, but works for our use case.

Kevin Kho

06/30/2021, 10:03 PM

Wow that sounds really easy! That’s good to know. Do you have reading material on that?

Jeff Baatz

07/01/2021, 9:04 PM

@Kevin Kho I do! https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#daemon-configuration-file This specifically is what we ended up doing. This user guide includes installation info as well further above. The docker agent launch then does the following in supervisord:

Copy code

[program:prefect-agent-gpu2]
command=prefect agent docker start -l {OUR_LABELS} -f -n {OUR_NAME} -t ${PREFECT_TOKEN} -e CUDA_VISIBLE_DEVICES="0,1" -e NVIDIA_VISIBLE_DEVICES="6,7"
user=supervisor_user
environment=HOME="/home/supervisor_user",USER="supervisor_user",NVIDIA_VISIBLE_DEVICES="6,7"

For us, we're interested in "reserving" two GPUs on an on-prem server for flows. The tricky part here is that this one docker agent will keep spinning up multiple processes without really realizing that the resources aren't sharable. The plan for now is to just give the runner a distinct label that has a flow concurrency cap of 1.

Jeff Baatz

07/01/2021, 9:07 PM

For that supervisor program excerpt - it's an excerpt from a template file that we use to keep our supervisor configuration version controlled. Nothing too weird. The main part of that is that NVIDIA_VISIBLE_DEVICES needs to be set in the docker agent environment (and I also set it in the program environment, but only out of paranoia). CUDA_VISIBLE_DEVICES will then start out indexed at 0. There are a few layers of indirection going on there that is a little confusing at first.

Kevin Kho

07/02/2021, 8:57 PM

@Marvin archive “Using a GPU with DockerRun”

Marvin

07/02/2021, 8:57 PM

https://github.com/PrefectHQ/prefect/issues/4737

4 Views

Open in Slack

Previous Next