CA Lee

    CA Lee

    1 year ago
    Hello @Anna Geller (old account) @Jimmy Le I’ve been following your tutorials on running serverless flows using AWS EKS (Fargate), great content and thanks for sharing with us! Anna: https://towardsdatascience.com/distributed-data-pipelines-made-easy-with-aws-eks-and-prefect-106984923b30 Jimmy: https://lejimmy.com/distributed-data-pipelines-with-aws-ecs-fargate-and-prefect-cloud/ I am just a hobbyist Prefect user, and have been using a USD5 / month Ubuntu 20.04 instance to run my flows - scraping data, changing the shape using Pandas, communicating with databases - on a schedule, pretty standard stuff. Deps, config and my own code have been installed onto the instance as a package. Tried out your tutorials out of curiosity and got my flows to work, but couldn’t wrap my head around 2 issues: • Cost - even keeping the Fargate agent running as a serverless EKS cluster, at USD0.10 / hour, would cost about USD72 / month. Is the right way to keep the agent on all the time? • Image size and transfer between runs - I’m using a base image which consists of some pip deps and my own custom package. On flow registration, Prefect creates a new container to copy the flow code in, so if my base image layer is say 500MB, my flow code layer is say 50kb, each serverless run of a flow would still cost 500.050 MB to run as the image is not persistent between flow runs. Each additional flow registered costs the entire size of the base image, plus an additional few kbs of flow code, which seems a bit extravagant While I can appreciate the distributed, fault-tolerant, highly available, self-healing & automatically scalable properties of serverless flow runs, does anyone have any insight on how we can reduce / manage the costs to do so?
    Jimmy Le

    Jimmy Le

    1 year ago
    I’ll have to report back. I believe ECR pulls in the same region don't count towards transfer. I’m curious if we can run flows on Fargate Spot instances to reduce compute costs by 60-70%. And what would happen when AWS demands the compute back.
    Fine print: “Data transfer “in” and “out” refers to transfer into and out of Amazon Elastic Container Registry. Data transferred between Amazon Elastic Container Registry and Amazon EC2 within a single region is free of charge (i.e., $0.00 per GB). Data transferred between Amazon Elastic Container Registry and Amazon EC2 in different regions will be charged at Internet Data Transfer rates on both sides of the transfer.”
    CA Lee

    CA Lee

    1 year ago
    There would still be charges incurred for private repo flow storage in ECR yes?
    Jimmy Le

    Jimmy Le

    1 year ago
    Yup, $0.10/GB after the first 50GB. I have a lifecycle policy in place to auto delete every untagged image after 5 days. I haven’t quite figured out how to auto set this policy yet when a repo is first created.
    If every flow is around 500 MB, you can store ~100 flows for free every month. More if your flows are compatible with the slim or alpine Python tags.
    CA Lee

    CA Lee

    1 year ago
    Thats great to know 👍🏼 ive been having trouble with alpine but slim works for me Now if the EKS agent can be free ... Last i tried Anna’s guide, I was charged $0.10 each hour just to keep the agent on. AFAIK the agent should be running all the time otherwise there is nothing to pick up the flows
    This was the command used to start and keep the agent running
    prefect agent kubernetes install -t {token} --rbac | kubectl apply -f -
    Billy McMonagle

    Billy McMonagle

    1 year ago
    @CA Lee Are you using Fargate (ECS) or EKS? Running an EKS cluster is going to cost you $144/month just for the EKS control plane. Any instances or fargate workloads are going to be charged on top of that.
    Also - I'd recommend this article for a quick explanation why you should NOT ever run Python on an alpine based image https://pythonspeed.com/articles/base-image-python-docker-images/
    CA Lee

    CA Lee

    1 year ago
    @Billy McMonagle thanks for the article, very helpful. I am completely new to Kubernetes, so was trying both the tutorials i listed. It seems i was being charged for the EKS control plane
    In my case where my compute needs are fairly low, I could just get by without EKS yes? Just use Fargate ECS + ECS agent
    Billy McMonagle

    Billy McMonagle

    1 year ago
    IMO you should avoid kubernetes unless you know you need it. ECS has its limits but is a good service and will probably save you some cost.
    CA Lee

    CA Lee

    1 year ago
    Ahh it was fairly confusing before, i thought EKS == ECS. I know better now, thanks for clearing it up
    Billy McMonagle

    Billy McMonagle

    1 year ago
    It’s very confusing at first! Good luck!
    a

    Anna Geller (old account)

    1 year ago
    @CA Lee if you create a new AWS account, you can use the free tier for your first 12 months and by following this tutorial, you shouldn't be charged anything really, because S3, EC2 t2.micro and ECS Fargate are free-tier-eligible: https://towardsdatascience.com/serverless-data-pipelines-made-easy-with-prefect-and-aws-ecs-fargate-7e25bacb450c You can additionally set up a billing alarm. Note that as a hobbyist, you can also register your laptop as a local agent, and you can start experimenting with Prefect this way.
    Jimmy Le

    Jimmy Le

    1 year ago
    If you sign up for the free YC's Start Up School and provide 4 weekly updates, they'll give you access to a ton of deals from YC alumni. AWS provides $5,000 worth of credits. It took about another week for the credits to show up in my account. https://www.startupschool.org
    I was also able to deploy my ECSAgent on the repl.it platform. It is super easy to use! File system, IDE, database, and terminal all in your browser. https://repl.it/talk/share/Prefect-ECSAgent-to-deploy-AWS-ECS-Fargate-tasks/120958
    CA Lee

    CA Lee

    1 year ago
    thanks @Jimmy Le. so i need to complete at least 4 weeks of videos on startup school to claim the usd5k AWS credits?
    Jimmy Le

    Jimmy Le

    1 year ago
    It’s not the videos, it’s just 4 questions you answer every week. Whats your MRR? How many customers did you talk to this week? What did you learn? What’s your top 3 goals for next week?