Hey everyone - I’m trying to figure out what would be a good architecture on AWS for running a large number of flows in a burst (ex: X00,000 flows run once a week, broken into smaller batches). Ideally I’d want the backing infrastructure that these flows run on to be ephemeral, so it seems like I could use any of the following to do this:
• Spinning up more agents temporarily (current plan)
• Kubernetes jobs
• ECS (?)
• Dask + Fargate (?)
◦ I know Dask parallelism operates at task level rather than flow level
I’m wondering if anyone here has similar use cases. If so, what works well for you?
✅ 1
a
Anna Geller
10/21/2022, 3:43 PM
ECS Fargate using the ECSTask might be a really good and painless option
Example repo with a blog post and video demo
Fargate now supports up to 120 GB memory per a single container which may obviate the need to move to distributed compute with Dask cloud provider
k
Krishnan Chandra
10/21/2022, 3:46 PM
Thanks Anna! I actually had your repo open while making this thread 🙂
I’m curious about the memory point though - in my case I’d mainly be going distributed to parallelize compute more than anything else
🙌 1
a
Anna Geller
10/21/2022, 5:49 PM
I mean that earlier, when Fargate was supporting only small amounts of memory per container, you had no choice but go distributed, now you do have a choice to run things on a single but more powerful container (added just a couple of weeks ago), all serverless (no Ops) and without the costs of distributed coordination required by Dask
k
Krishnan Chandra
10/21/2022, 5:51 PM
Ah gotcha. That’s helpful too in case I need any super large jobs in the future
🙌 1
a
Anna Geller
10/21/2022, 5:54 PM
in case you're interested, this thread discusses the challenges of running Dask on Fargate -- might be easier to avoid unless really necessary. Sharing in case it may be helpful