Qiong Liu
03/12/2024, 6:31 PMMarvin
03/12/2024, 6:33 PMsubmit
and map
for running tasks concurrently in Prefect depends on what you are trying to achieve:
- map
is a feature offered by Prefect to run the same task with different inputs concurrently, essentially creating a dynamic parallel pipeline. When you use map
, each task is a standalone instance, allowing for independent retries and alerts. It's very suitable for cases where you need to perform the same operation on a list of inputs.
- submit
isn't explicitly mentioned in the documentation excerpts, but generally in asynchronous programming, a submit
function is used to run tasks in a thread pool or on an executor, without waiting for the task to complete.
For running tasks concurrently, if your use case involves processing a collection of data in parallel and each element of the collection can be processed independently, then map
is likely the recommended approach.
Documentation on concurrency and task runners suggests using ConcurrentTaskRunner
for IO-bound tasks to run concurrently, or utilizing Prefect integrations like DaskTaskRunner
or RayTaskRunner
for parallel execution.
To see more details and decide which is best for your needs, please consult the links provided below:
- Task Runners in Prefect
- Prefect Blog: Why Not Airflow? (this article uses the example of task mapping in Prefect to illustrate dynamic parallel pipelines).Qiong Liu
03/12/2024, 6:56 PMMarvin
03/12/2024, 7:02 PMmap
feature. The general concept is to allow dynamic parallel pipelines, where each item in an iterable can be processed as an independent task.
However, while there may not be a specific limit on the number of mapped tasks, you should be aware that practical limits are likely to be imposed by:
- Your system's resources.
- Concurrency limits that you define using task tags, which restrict the number of concurrent runs for tasks with those tags.
- Any limits imposed by the infrastructure running your flow, such as Kubernetes pods, available CPU, memory, or database connections.
It's important to plan your resources accordingly, and if necessary, institute controls such as concurrency limits to prevent overwhelming your system and ensuring smooth execution.
For more information on task concurrency limits, please refer to the Tasks section of the Prefect documentation.