Hi, I am looking into building pipelines that invo...
# prefect-community
k
Hi, I am looking into building pipelines that involve GPU work on a very irregular basis that autoscaling is a must. In my case, dask is not really an option as it interferes with the multiprocessing of pytorchs dataloader and otherwise was rather unstable for my GPU workload. To make it simpler, I would like to start a flow that I know needs GPU on it's own instance and do the processing there. The nodes would need to spin up or down depending on current demand. Any pointers how to best achieve this? Currently, I see different possibilities, but not sure which one is best: • Spin up a new Agent on a GPU node before the flow is scheduled and then using tags • Somehow use KubernetesRun to request a GPU and let kubernetes handle the up and downscaling • Only use prefect to trigger a ECS job
s
I've used dask with the dataloader for multiprocessing- what trouble did you have there?
k
The issue was with daemonic processes not being able to spawn children.
s
Ah yeah, I know that issue- for me, setting this environment variable fixed it:
Copy code
DASK_DISTRIBUTED__WORKER__DAEMON=False
❤️ 2
That was a gnarly problem, very frustrating!
h
one other thing is that after you do that - you should also pass a multiprocessing context into the data loader (because the dask multiprocessing context is different than what pytorch typically uses
Something like this
Copy code
train_loader = torch.utils.data.DataLoader(
    whole_dataset, sampler=train_sampler, batch_size=batch_size, num_workers=num_workers, multiprocessing_context=mp.get_context('fork')
)
❤️ 1
k
Thanks! I will try this for sure! Any hints for scaling up without Dask? 🙂
s
Sorry, we're the dask people! 🙂 We also helped build a library that made parallel training with dask + pytorch a lot easier, if it's any help- dask-pytorch-ddp (it's on pypi)
h
we being @Stephanie Kirmer and me, but other folks here may have other thoughts
@Kilian are you talking about training or inference? Or something different alltogether?
k
thanks for the help anyway, maybe going this route will also help achieve the endgoal 🙂
h
if you only need a single GPU instance (you don't need multiple GPU instances that need to communicate with each other) than prefect + ECS or k8s will probably do just fine (without dask)
if you're trying to do parallel training with DDP then your PyTorch machines would need some way to coordinate (you need to figure out which one is the master and pass that information around)
k
In the end both, but one of them would already help. Yes, only single GPU instance is necessary. No parallel training. So for that I would use KubernetesRun, specify the need for GPU there, and let k8s handle up and downscale? Thank you for your help!
upvote 1