https://prefect.io logo
Title
k

Krishnan Chandra

09/01/2022, 6:34 PM
Hey folks! I’m wondering what would be a good pattern to set up long-running Prefect agents in Prefect 2.0. It looks like the
DockerContainer
and
KubernetesJob
create new containers/jobs per flow run, but what I’d ideally like to do is run a fleet of long-running workers that process from the queue. What’s the best infrastructure to choose in Prefect 2.0 to achieve this?
m

Matthias

09/02/2022, 2:57 PM
I would say it depends… how often would you actually need to process? Is it every second, every minute, …? Another factor that influences the architecture is how long the actual processing job runs.
k

Krishnan Chandra

09/02/2022, 2:59 PM
Good questions! It’s typically a few dozen jobs per minute and the processing time is very short (1-2 seconds)
So for that reason, I’d like to avoid creating a new Kubernetes job per flow run, as the overhead is quite large for a very short job
m

Matthias

09/02/2022, 3:50 PM
In that case, I think a streaming engine like apache Beam, Spark Streaming, Flink, … is a better option
Or even writing it in pure Python
k

Krishnan Chandra

09/02/2022, 3:55 PM
The jobs have multiple external API calls and somewhat complex internal state, which is why I wanted to use Prefect / workflow manager. Streaming engine I think is definitely not the right fit here, but pure Python + queue is a decent idea. Maybe something like SQS could work