Jakub Hettler

11/04/2020, 10:52 PM
Hi everyone, first thanks for all work on Prefect, good job! đź‘Ź We are in the phase of testing Prefect as a potential replacement of Airflow, but I am not sure if I understand the Agent part of concepts correctly. In Airflow we have workers and there are many of workers across our infrastructure (servers) - we can scale just by adding workers - to be able to run more and more tasks. And worker is asking if there are some tasks what worker can do. As I understand, workers are called Agents in Prefect and they have similar function - am I right? If yes, how can I scale the Agents (distribute them across servers) or how the scaling is done and is it possible to scale community version? Thanks for any explanation! cc @Radek Tomsej
đź‘€ 1

Chris White

11/04/2020, 11:05 PM
Hi @Jakub Hettler, and welcome! While there are some similarities between Prefect and Airflow, the execution architecture and scaling properties are drastically different (in Prefect, we make a very stark distinction between “orchestration” vs “execution”). Let me try to give you a quick overview of the different execution concepts in Prefect: - Runs / Jobs: generally in Prefect, we talk about work at the level of the Flow; this is actually an important distinction to keep in mind for the next few concepts. Flow Runs can be created on a schedule and / or triggered through the API on demand - Executors: Executors can be attached to individual Flows and are responsible for distributing / executing the tasks within a single Flow. So for example, you can use the default
which will run through each task in your Flow in sequence. Note that the Prefect scheduler is not involved at this level, meaning your tasks will execute as fast as your local process allows. You can gain parallelism (within a single Flow) by switching to one of Prefect’s Dask executors. These can be used to distribute work across multiple machines or multiple threads / processes. Note that each Flow can have an entirely different executor, which means Flows can spin up their own Dask clusters or share a cluster, or whatever makes sense for your work. - Agents: agents are responsible for submitting Flow runs within your infrastructure. Note that executors will then handle the job of farming out individual tasks for a given flow, and each flow may have a different executor type. You can alter the deployment platform of your flows by changing which agent types submit them (e.g., k8s agents submit work on k8s, local agents submit work in subprocesses, etc.). Combining all of this, you can easily scale out to hundreds of thousands of tasks with only a single agent and an appropriately configured executor on your flows. Multiple agents is usually desirable when you want to execute flows on different platforms, but is not necessary to handle scale. I hope that helps!
đź‘Ť 4
đź‘€ 3