Hello - I'm the Analytics & ML lead at Netdata Cloud and looking at prefect cloud as a potential alternative to some flavor of Airflow (self managed, composer or ideally Astronomer).
I'm having a little trouble understanding what exactly agents are or what i would need - kinda feels like agents are almost like airflow workers but not quite. If i wanted to use prefect cloud to manage the orchestration of BigQuery jobs and GCP cloud function based pipelines. It seems like maybe i would still need to bring my own agents with me - is that correct? If so do you have any typical examples of how people might do this?
From what i have read it looks like maybe i'd ideally have my own dask or kubernetes cluster to have lots of agents available to prefect cloud? Just wondering how to think of scaling this - how would i know how many agents i would need?
Or i may be getting things wrong in my understanding perhaps?
p.s. apologies if this not a good place for this question. Oh and hello again 🙂
09/13/2021, 2:57 PM
I don’t have a whole lot of airflow experience, but my understanding is that the agent in prefect is some what more closely related to the scheduler in airflow. That said, the agent is a bit more of a hybrid/dual purpose entity. It feels like in prefect you have slightly less flexibility in terms of what your worker/operator/task compute environment looks like. In airflow for example you can use KubernetesPodOperator and have a fully custom container instance per task. In prefect, each flow execution is a single ‘instance’, with task scaling/parallel processing accomplished via dask. The upside to the decreased flexibility, is a much more ergonomic approach to handling task input/output. Tasks can return stuff and other tasks can use the results, without a system like Xcom, etc.
On the topic of big/query and cloud functions. It should be possible using a task per query/cloud function. I have been doing something similar with AWS Batch jobs. Let me look a bit more.
09/13/2021, 2:58 PM
Hi @Andrew Maguire! Welcome to Prefect! Agents are long running processes that you would need to deploy your own flows. You would always need an agent on but it’s a lightweight process so it doesn’t need to be powerful compute.
You can have an agent offshore the work to the dask cluster, or you can have an agent pod int he kubernetes cluster. The agent just kicks off the job and then listens for more jobs so you really only need 1 agent.
The way you’d “spell” the flow you outlined in Prefect would be to give each child flow it’s own KubernetesRunConfig and the parent becomes a very lightweight process that’s only responsible for scheduling the child flows to run according to the graph
Hope you get a chance to check out this pattern 😄
Welcome @Andrew Maguire! We usually use the #prefect-community channel for questions, but we’re happy to help you here 😄 I think Kevin has answered your question, but let us know if you have any follow ups 👍
09/13/2021, 4:03 PM
Welcome to the Prefect Community @Andrew Maguire!
@Welcome to the Prefect community
09/14/2021, 7:12 AM
thanks all! will keep working through the tutorials