https://prefect.io logo
m

Mokshith Voodarla

12/05/2022, 7:13 PM
Hi! I was thinking of using Prefect but I'm not sure if it works well for my task. I am running large pools of GPU workers nodes which are orchestrated via some separate set of logic, and latency matters a lot. My thought was to set up Kafka as a messaging queue that all these workers listen too since it also offers the ability to do kSQL joins which are beneficial for me to say match up frames and frame metadata that might've been processed by different models upstream. I was also concerned about somehow having a warm pool of workers so the models don't need to cold start every time. I wouldn't even think about a workflow orchestrator but I want to have the benefits of task observability, logging, etc which it feels I wouldn't get if I used Kafka by itself.
p

Peyton Runyan

12/06/2022, 11:25 AM
When you say latency matters a lot, do you mean milliseconds or seconds? What's your idea of acceptable latency?
was to set up Kafka
- unless you're already running kafka and have a team that knows what they're doing with it, I would use almost anything else. The overhead of setting up kafka and then keeping it playing nicely beyond toy POCs is pretty massive.
m

Mokshith Voodarla

12/06/2022, 4:03 PM
We would ideally not want to add seconds of over head here and there so I'll say millisecond latency. Doesn't mean jobs will finish in milliseconds because deep learning models will take time to run.
p

Peyton Runyan

12/06/2022, 4:10 PM
I think this would be pretty tough for Prefect to do given a few things: • Prefect makes a number of network calls as part of its orchestration. Those will add a bit of time. • Prefect agents polls every
n
seconds for work. I think default
5
so if you kick off a job it may be a couple seconds • Infrastructure still needs to be spun up (in this case a pod to run a job) Do you have a bit more info on the specific usecase? It might help me to figure out a solution that makes sense.
m

Mokshith Voodarla

12/06/2022, 5:22 PM
Gotcha. We are building something that can do a ton of deep learning on video. Defining models in workflows and being able to parallelize ops when needed, and being able to expose data to a user even while a job is processing (say like a stream). Imagine potentially running a model like OpenAI's Whisper but parallelizing it with a separate op that first detects all the silences in the audio and then pipes each of the audio segments into whisper.
p

Peyton Runyan

12/06/2022, 5:43 PM
Nice! I think you could potentially make this work. You could have a parent deployment triggered immediately via an API call, and have that deployment trigger a series of processing jobs using
client.create_flow_run_from_deployment
. You're still definitely going to have some initial latency on the setup though.
8 Views