https://prefect.io logo
Title
a

Andrew Maguire

09/13/2021, 2:33 PM
Hello - I'm the Analytics & ML lead at Netdata Cloud and looking at prefect cloud as a potential alternative to some flavor of Airflow (self managed, composer or ideally Astronomer). I'm having a little trouble understanding what exactly agents are or what i would need - kinda feels like agents are almost like airflow workers but not quite. If i wanted to use prefect cloud to manage the orchestration of BigQuery jobs and GCP cloud function based pipelines. It seems like maybe i would still need to bring my own agents with me - is that correct? If so do you have any typical examples of how people might do this? From what i have read it looks like maybe i'd ideally have my own dask or kubernetes cluster to have lots of agents available to prefect cloud? Just wondering how to think of scaling this - how would i know how many agents i would need? Or i may be getting things wrong in my understanding perhaps? p.s. apologies if this not a good place for this question. Oh and hello again šŸ™‚
šŸ‘‹ 10
k

Kyle McChesney

09/13/2021, 2:57 PM
Hey Andrew, I don’t have a whole lot of airflow experience, but my understanding is that the agent in prefect is some what more closely related to the scheduler in airflow. That said, the agent is a bit more of a hybrid/dual purpose entity. It feels like in prefect you have slightly less flexibility in terms of what your worker/operator/task compute environment looks like. In airflow for example you can use KubernetesPodOperator and have a fully custom container instance per task. In prefect, each flow execution is a single ā€˜instance’, with task scaling/parallel processing accomplished via dask. The upside to the decreased flexibility, is a much more ergonomic approach to handling task input/output. Tasks can return stuff and other tasks can use the results, without a system like Xcom, etc. On the topic of big/query and cloud functions. It should be possible using a task per query/cloud function. I have been doing something similar with AWS Batch jobs. Let me look a bit more.
k

Kevin Kho

09/13/2021, 2:58 PM
Hi @Andrew Maguire! Welcome to Prefect! Agents are long running processes that you would need to deploy your own flows. You would always need an agent on but it’s a lightweight process so it doesn’t need to be powerful compute. You can have an agent offshore the work to the dask cluster, or you can have an agent pod int he kubernetes cluster. The agent just kicks off the job and then listens for more jobs so you really only need 1 agent.
d

Dylan

09/13/2021, 3:45 PM
Hi @Kyle McChesney! It’s possible to achieve the one-environment-per-task pattern using a flow-of-flows, also known as the orchestrator pattern. You can check out the documentation here: https://docs.prefect.io/core/idioms/flow-to-flow.html
The way you’d ā€œspellā€ the flow you outlined in Prefect would be to give each child flow it’s own KubernetesRunConfig and the parent becomes a very lightweight process that’s only responsible for scheduling the child flows to run according to the graph
Hope you get a chance to check out this pattern šŸ˜„
Welcome @Andrew Maguire! We usually use the #prefect-community channel for questions, but we’re happy to help you here šŸ˜„ I think Kevin has answered your question, but let us know if you have any follow ups šŸ‘
d

David Abraham

09/13/2021, 4:03 PM
Welcome to the Prefect Community @Andrew Maguire! @Welcome to the Prefect community
a

Andrew Maguire

09/14/2021, 7:12 AM
thanks all! will keep working through the tutorials