Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.

Prefect Community

One question about concurrency limits together with "task-groups".

I'm building a ETL pipeline. I'm only focus on the Extract in this question.

• extraction contains different source systems, which needs an external tool with a limited resource, using tags for tasks concurrency limits
    ◦ the limited resource allows only 4 concurrent connections
• lets say we load 5 tables for source A, and 5 tables for source B
    ◦ for simplification, each task loads one table and lets say needs exact the same time of 5 seconds
• I start to load source A.
    ◦ 0s: submit all 5 A tables
    ◦ 5s: limit allows the execution of 4 in parallel
    ◦ 10s: then it will execute the last A table
• I want to load B as soon as possible, but not before, the last Table of A has started.
At the same time, there are more tasks running, which do not use the limited resources.

How can I achieve this?