Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

Hi, everybody!
I try to find right pipeline for my computer vision scenarios, which consist from several tasks(detection, segmentation, tracking, some_business_tasks, etc.).
Each task has a lot of parameters, own environment(docker image) and depend on the result of previous task.
Task result stored in noSQL DB.
I have 5 virtual machines: 2 got GPU and 3 got CPU only. 
I want to run some tasks(detection, segmentation.) only on VM with GPU.
How can I organize Prefect flow for this pipeline?

Hey <@U02B0Q08U7M>,

A couple of things, first is how to assign Flows to the appropriate machine. <https://docs.prefect.io/orchestration/flow_config/run_configs.html#labels|Labels> are our mechanism for that. You would have a GPU label that gets assigned to that VM’s agent and to the Flow. This will make sure flows that need the GPU get picked up by the GPU VM.

Next, users with this setup find that they need to manage the concurrency of Flows on the GPU machines because it’s very common to run out of memory and you only want to run one flow at the time. This is a Prefect Cloud feature on the standard <https://docs.prefect.io/orchestration/flow-runs/concurrency-limits.html|tier>.

Some users also break things up into sub-flows, where things like model training are done on the GPU, and the others are done on CPU.

We don’t have support for storing results in noSQL DB by default, but you should be able to do this manually by writing and reading for that.

There is also some work to get the GPU visible to Prefect, but I think this is enough to get started with.