Hello
@Anna Geller (old account) @Jimmy Le
I’ve been following your tutorials on running serverless flows using AWS EKS (Fargate), great content and thanks for sharing with us!
Anna:
https://towardsdatascience.com/distributed-data-pipelines-made-easy-with-aws-eks-and-prefect-106984923b30
Jimmy:
https://lejimmy.com/distributed-data-pipelines-with-aws-ecs-fargate-and-prefect-cloud/
I am just a hobbyist Prefect user, and have been using a USD5 / month Ubuntu 20.04 instance to run my flows - scraping data, changing the shape using Pandas, communicating with databases - on a schedule, pretty standard stuff. Deps, config and my own code have been installed onto the instance as a package.
Tried out your tutorials out of curiosity and got my flows to work, but couldn’t wrap my head around 2 issues:
• Cost - even keeping the Fargate agent running as a serverless EKS cluster, at USD0.10 / hour, would cost about USD72 / month. Is the right way to keep the agent on all the time?
• Image size and transfer between runs - I’m using a base image which consists of some pip deps and my own custom package. On flow registration, Prefect creates a new container to copy the flow code in, so if my base image layer is say 500MB, my flow code layer is say 50kb, each serverless run of a flow would still cost 500.050 MB to run as the image is not persistent between flow runs. Each additional flow registered costs the entire size of the base image, plus an additional few kbs of flow code, which seems a bit extravagant
While I can appreciate the distributed, fault-tolerant, highly available, self-healing & automatically scalable properties of serverless flow runs, does anyone have any insight on how we can reduce / manage the costs to do so?