https://prefect.io logo
Title
j

Jins Kadwood

10/28/2020, 3:30 AM
Hi team! Im exploring the use of Prefect too. I found the overall product proposition very good. However I was confused with the setup and installation process for AWS ECS (Fargate) while using Prefect Cloud. As I understand it: 1. Set up Prefect Cloud account 2. Docs say to set up a Prefect Agent? Tried to follow the guide, but was confused about the setup - There is no container image provided for the agent (using AWS Console). How should I go about setting the agent up on Fargate? 3. How many agents does Prefect require? 1 per job? Or is it more like Gitlab/Github runners where you can have a few shared/pooled agents to execute the jobs? 4. Any guidance on the compute size of the agents? I’ll be looking to run frequent 10M record ETLs. So wasnt sure the size of the agents need to support something like that? Apologies for the simple / dumb questions
c

Chris White

10/28/2020, 3:40 AM
Hi @Jins Kadwood! Luckily, agents are very simple processes: 2. Anywhere that has Prefect installed, has a Cloud auth token, and has permissions to create Fargate tasks can be used to run your agent. I believe most folks use a dedicated EC2 box for this. 3. You can run as few or as many agents as you like (as long as you have at least 1!). Agents represent different deployment environments for your flow runs, so in general running more than 1 is used for deploying flow runs to different environments although there’s nothing inherently wrong with horizontally scaling a single agent type. All agents do is submit flow runs for execution, the flow executor handles all individual task runs. A single agent can easily handle thousands of flow runs so long as your execution environment can handle that scale. 4. Same as above, as long as the agent can query for work and submit jobs, it doesn’t need any additional resources beyond that. The jobs (i.e., Fargate Tasks) the agent creates are where the heavy lifting will happen, so you’ll need to make sure they have the appropriate CPU / memory / etc. for the work your flows are doing. This is another place where multiple agents can come in handy (e.g., sending resource intensive flows to agent A and less intensive flows to agent B)
🙌 1
j

Jins Kadwood

10/28/2020, 3:43 AM
Ahh ok that makes sense now. For what ever reason I was reading it as setting up the the agent directly on ECS Fargate. Makes sense to just run a small EC2 box that manages all this for us. If Im using Prefect cloud, does the job execute on the cloud itself or on the Fargate task?
Thanks the for very quick reply! I appreciate it! Always keen to test and give open source kits a go
👍 1
c

Chris White

10/28/2020, 3:51 AM
Prefect Cloud was designed so that orchestration happens within Cloud, while execution of individual units of work occur within your infrastructure. So using Fargate as an example, all of your code will run within the Fargate task that the agent creates, but all of the metadata about your run (task run states, logs, etc.) will be stored in Cloud. If you’re interested, you can read more about this design choice here: https://medium.com/the-prefect-blog/the-prefect-hybrid-model-1b70c7fd296
j

Jins Kadwood

10/28/2020, 3:52 AM
Ah thats nice! Ok totally makes sense now. Thanks @Chris White for the simple explainer
c

Chris White

10/28/2020, 3:52 AM
Anytime!
b

Billy McMonagle

10/28/2020, 2:18 PM
This response was helpful for me to read as well, thank you @Chris White. If I may ask a follow-up: can you say more about spinning up fargate flows with more or less resources, and how that might be configured? Say I have Flow A, requiring 1 vCPU with appropriate memory, and Flow B requiring 16 vCPUs. Is running two agents with different resource values the only/best way to go, or is there a simple way to override these parameters at the flow level? To be more specific, is the FargateTaskEnvironment the way to do this? I have not experimented much with it, but I find myself wanting the ability to "override" a small number of configuration attributes (eg,
cpu
and
memory
while leaving others intact (eg
cluster
,
networkCOnfiguration
, etc).