One question about concurrency limits together with "task-groups".
I'm building a ETL pipeline. I'm only focus on the Extract in this question.
• extraction contains different source systems, which needs an external tool with a limited resource, using tags for tasks concurrency limits
◦ the limited resource allows only 4 concurrent connections
• lets say we load 5 tables for source A, and 5 tables for source B
◦ for simplification, each task loads one table and lets say needs exact the same time of 5 seconds
• I start to load source A.
◦ 0s: submit all 5 A tables
◦ 5s: limit allows the execution of 4 in parallel
◦ 10s: then it will execute the last A table
• I want to load B as soon as possible, but not before, the last Table of A has started.
At the same time, there are more tasks running, which do not use the limited resources.
How can I achieve this?