Hi, I’m trying to calculate some cost estimations for running cloud resources for our ETL pipelines ...

Henrietta Salonen

01/11/2022, 2:18 PM

Hi, I’m trying to calculate some cost estimations for running cloud resources for our ETL pipelines in Prefect Cloud and to benchmark my own estimations I’d be curious to hear (just on a high level) what type of workloads your organization is dealing with and your monthly infrastructure costs. If you are using AWS EC2 instances (my plan is to use these for Docker Agent + DockerRun), I’d love to hear particularly the costs in this area in relation to your task workloads. I’m trying to find a cost efficient solution but want to start simple, although in future we may move more towards a setup described by @Anna Geller in this nice article: https://towardsdatascience.com/how-to-cut-your-aws-ecs-costs-with-fargate-spot-and-prefect-1a1ba5d2e2df Especially if scaling & maintaining the on-demand EC2 setup creates too much overhead

Anna Geller

01/11/2022, 2:39 PM

Great question! I think generally speaking the largest issue is to estimate required capacity and find the right solution based on your workload. Specifically, if you have many jobs that can run any time e.g. hourly batch jobs, then using ECS Fargate in a serverless fashion is a great choice because you can avoid idle resources and you pay only for what you use. But if you have e.g. workflows running pretty much all the time (e.g. every 5 minutes or even every minute), then it can make sense to have resources being always on 24/7. You could e.g. use AWS EKS with autoscaling

Henrietta Salonen

01/11/2022, 3:04 PM

For now, most of our jobs will be just hourly batch jobs, so spot instance setup, e.g. by utlizing Fargate running ECS tasks, could be a working setup for us. However, I wanted to keep the infrastructure pretty basic for now as we are just starting to use Prefect and my plan is to do a performance vs cost assessment in a couple of months when we have been running some flows in production. It may that in future we would like to use Prefect (if possible) for real-time data serving as well, although I think Prefect does not yet fully support event-driven streaming, I saw there is a a improvement suggestion about that: https://docs.prefect.io/core/pins/pin-08-listener-flows.html In that case we would have to extend our cloud infrastructure in any case

Anna Geller

01/11/2022, 3:23 PM

You’re right that in Prefect 1.0 it would be a bit harder, even though if you do minutely scheduled jobs retrieving data from some real-time APIs, it would work fine. You can check this blog post discussing real-time streaming with Orion. Sounds like a great plan going forward, LMK if you have any specific questions about that

👀 1

Henrietta Salonen

01/12/2022, 2:00 PM

oh, cool, thank you for sharing!

10 Views

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.