Hey folks! I’m wondering what would be a good pattern to set up long-running Prefect agents in Prefe...

Krishnan Chandra

09/01/2022, 6:34 PM

Hey folks! I’m wondering what would be a good pattern to set up long-running Prefect agents in Prefect 2.0. It looks like the

DockerContainer

and

KubernetesJob

create new containers/jobs per flow run, but what I’d ideally like to do is run a fleet of long-running workers that process from the queue. What’s the best infrastructure to choose in Prefect 2.0 to achieve this?

Matthias

09/02/2022, 2:57 PM

I would say it depends… how often would you actually need to process? Is it every second, every minute, …? Another factor that influences the architecture is how long the actual processing job runs.

Krishnan Chandra

09/02/2022, 2:59 PM

Good questions! It’s typically a few dozen jobs per minute and the processing time is very short (1-2 seconds)

Krishnan Chandra

09/02/2022, 2:59 PM

So for that reason, I’d like to avoid creating a new Kubernetes job per flow run, as the overhead is quite large for a very short job

Matthias

09/02/2022, 3:50 PM

In that case, I think a streaming engine like apache Beam, Spark Streaming, Flink, … is a better option

Matthias

09/02/2022, 3:51 PM

Or even writing it in pure Python

Krishnan Chandra

09/02/2022, 3:55 PM

The jobs have multiple external API calls and somewhat complex internal state, which is why I wanted to use Prefect / workflow manager. Streaming engine I think is definitely not the right fit here, but pure Python + queue is a decent idea. Maybe something like SQS could work

2 Views

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.