Is there an architectural diagram anywhere to explain how all of the different Prefect components fit together (e.g. agents, executors etc). Trying to wrap my head around some of the terminology and concepts after coming from a heavy Airflow background.
Something like this, just as a visual / birds eye view
05/22/2022, 4:48 PM
There is this but I dont know if that’s readable immediately
Todd de Quincey
05/22/2022, 5:03 PM
So am I right in my understanding that we need the following key components and their purpose:
1. Prefect Cloud / Server - essentially the Airflow scheduler if I wanted to compare this to Airflow land. It monitors, schedules and executes our flows, but it does NOT manage or execute the underlying tasks.
2. Flows - essentially an Airflow DAG (although not actually a directed a directed acyclic graph, just using the term for comparing apples-to-apples)
3. Tasks - same as in Airflow. Units of work that get executed.
4. Agent(s) - a lightweight process which polls the Cloud/Server API for new flow runs which need to be executed. Agent will monitor and manage the underlying tasks in that flow (this is where I am fuzzy, so quite likely very wrong).
5. Executor(s) - underlying compute which runs individual tasks, managed by the agent.
Presuming I have the above correct(ish), I think my main confusion is around why we would push the agents to say Kubernetes or Fargate. Running our tasks on an elastic service like this makes sense, especially if the number of tasks is dynamic.
Is the use case for running agents on an elastic service like this in case of dynamic flow run generation (so each flow run is run on it’s own fargate task?).
I think my head is too much in Airflow land, whereby if we have say a fixed number of 200 DAGs (Flows), the elastic compute is applied to the underlying tasks in those DAGs, not the DagRuns themselves. Whereas, I think Prefect is more flexible and we can also scale up the FlowRuns (DagRuns)?
By the way, presuming all of this is roughly the same / equivalent in 2.0 (looking at the 1.0 docs atm, as they explain concepts a lot more)
05/22/2022, 5:09 PM
Mostly right. The agent starts up the Flow with the specified configuration (Kubernetes, Docker, ECS). Each Flow gets a different container. The Flow is responsible of running tasks.
Using Kubernetes gives you the containerized execution and spins up the Flow, and then removes the container when done. Dynamicism tends to be handles by the Executor (Local/Dask). DaskExecutor can scale out the number of tasks being executed across a cluster.
The main difference is Prefect doesn’t host compute. You provide your own via the agent. Prefect Cloud doesnt run anything. So the agent is more of a way to fetch the work and run it on compute infrastructure that the user provides (Kubernetes cluster for example)
2.0 is slightly different, but would probably confusing to get into while understanding 1.0
Todd de Quincey
05/22/2022, 5:11 PM
Right, if they are quite different, I’ll jump to the 2.0 docs, as looking to implement 2.0 in a greenfield project
I see the difference already. FlowRunners etc
05/22/2022, 6:15 PM
let us know if you have any specific questions, @Todd de Quincey
Todd de Quincey
05/22/2022, 7:03 PM
Thanks, Anna. I think the concepts in 2.0 (or at least the docs) make a bit more sense for me. So far so good :)
Is there a similar diagram for 2.0 by any chance?
Also, is there any documentation on how to self-deploy 2.0? We are initially looking to use Prefect cloud, but interested to also see what key components would be required to self-host.
Also, any idea on what the pricing for 2.0 will look like. As I understand, the number of tasks concept is being removed and replaced with users / env concept (much better imo). But curious as to what the pricing might look like.