https://prefect.io logo
k

Krishnan Chandra

10/21/2022, 3:39 PM
Hey everyone - I’m trying to figure out what would be a good architecture on AWS for running a large number of flows in a burst (ex: X00,000 flows run once a week, broken into smaller batches). Ideally I’d want the backing infrastructure that these flows run on to be ephemeral, so it seems like I could use any of the following to do this: • Spinning up more agents temporarily (current plan) • Kubernetes jobs • ECS (?) • Dask + Fargate (?) ◦ I know Dask parallelism operates at task level rather than flow level I’m wondering if anyone here has similar use cases. If so, what works well for you?
1
a

Anna Geller

10/21/2022, 3:43 PM
ECS Fargate using the ECSTask might be a really good and painless option Example repo with a blog post and video demo
Fargate now supports up to 120 GB memory per a single container which may obviate the need to move to distributed compute with Dask cloud provider
k

Krishnan Chandra

10/21/2022, 3:46 PM
Thanks Anna! I actually had your repo open while making this thread 🙂 I’m curious about the memory point though - in my case I’d mainly be going distributed to parallelize compute more than anything else
🙌 1
a

Anna Geller

10/21/2022, 5:49 PM
I mean that earlier, when Fargate was supporting only small amounts of memory per container, you had no choice but go distributed, now you do have a choice to run things on a single but more powerful container (added just a couple of weeks ago), all serverless (no Ops) and without the costs of distributed coordination required by Dask
k

Krishnan Chandra

10/21/2022, 5:51 PM
Ah gotcha. That’s helpful too in case I need any super large jobs in the future
🙌 1
a

Anna Geller

10/21/2022, 5:54 PM
in case you're interested, this thread discusses the challenges of running Dask on Fargate -- might be easier to avoid unless really necessary. Sharing in case it may be helpful
j

Jimmy Le

10/24/2022, 2:12 PM
super excited to dive into Prefect + ECS again 😄
💯 1
🙌 2