Hey folks! I’m wondering what would be a good pattern to set up long-running Prefect agents in Prefect 2.0. It looks like the
DockerContainer
and
KubernetesJob
create new containers/jobs per flow run, but what I’d ideally like to do is run a fleet of long-running workers that process from the queue. What’s the best infrastructure to choose in Prefect 2.0 to achieve this?
m
Matthias
09/02/2022, 2:57 PM
I would say it depends… how often would you actually need to process? Is it every second, every minute, …? Another factor that influences the architecture is how long the actual processing job runs.
k
Krishnan Chandra
09/02/2022, 2:59 PM
Good questions! It’s typically a few dozen jobs per minute and the processing time is very short (1-2 seconds)
Krishnan Chandra
09/02/2022, 2:59 PM
So for that reason, I’d like to avoid creating a new Kubernetes job per flow run, as the overhead is quite large for a very short job
m
Matthias
09/02/2022, 3:50 PM
In that case, I think a streaming engine like apache Beam, Spark Streaming, Flink, … is a better option
Matthias
09/02/2022, 3:51 PM
Or even writing it in pure Python
k
Krishnan Chandra
09/02/2022, 3:55 PM
The jobs have multiple external API calls and somewhat complex internal state, which is why I wanted to use Prefect / workflow manager. Streaming engine I think is definitely not the right fit here, but pure Python + queue is a decent idea. Maybe something like SQS could work
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.