I'm trying to setup a flow that runs a lot of mapped tasks (100-10,000) concurrently (limited via tags).
I run into issues where these tasks, which should usually take 20 seconds at max, are stuck in the running state for what feels like forever (30 - 45 mins). Soon, the concurrency limit is reached and no progress is made.
Here is some more context:
• As far as I can tell, all pending tasks submit a request to be run when the concurrency limit is met. I wonder if this is by design or if there's some way to tweak how many are actually in the "pending" state as opposed to just "queued"
• These tasks return a lot of data, so I've disabled caching them to memory. Would it be better to enable caching in this case? In some cases, all the mapped tasks don't return anything, but in others they return a large list of strings.
• A single light-weight (0.25vCPU ECS service) agent (and work queue) controls all deployment runs. Would increasing its compute resources improve the reliability of the flows
I would like to know how I can rectify this issue and make the system more reliable and efficient.
Any help is really appreciated!
Thank you for your time!