<@ULVA73B9P> to run task concurrently, is it recom...
# marvin-in-the-wild
q
@Marvin to run task concurrently, is it recommended to use submit or map
m
The choice between
submit
and
map
for running tasks concurrently in Prefect depends on what you are trying to achieve: -
map
is a feature offered by Prefect to run the same task with different inputs concurrently, essentially creating a dynamic parallel pipeline. When you use
map
, each task is a standalone instance, allowing for independent retries and alerts. It's very suitable for cases where you need to perform the same operation on a list of inputs. -
submit
isn't explicitly mentioned in the documentation excerpts, but generally in asynchronous programming, a
submit
function is used to run tasks in a thread pool or on an executor, without waiting for the task to complete. For running tasks concurrently, if your use case involves processing a collection of data in parallel and each element of the collection can be processed independently, then
map
is likely the recommended approach. Documentation on concurrency and task runners suggests using
ConcurrentTaskRunner
for IO-bound tasks to run concurrently, or utilizing Prefect integrations like
DaskTaskRunner
or
RayTaskRunner
for parallel execution. To see more details and decide which is best for your needs, please consult the links provided below: - Task Runners in Prefect - Prefect Blog: Why Not Airflow? (this article uses the example of task mapping in Prefect to illustrate dynamic parallel pipelines).
q
@Marvin is there a limit on task map
m
From reviewing the Prefect documentation and relevant resources, there is no clear, explicitly stated limit on the number of tasks you can map using Prefect's
map
feature. The general concept is to allow dynamic parallel pipelines, where each item in an iterable can be processed as an independent task. However, while there may not be a specific limit on the number of mapped tasks, you should be aware that practical limits are likely to be imposed by: - Your system's resources. - Concurrency limits that you define using task tags, which restrict the number of concurrent runs for tasks with those tags. - Any limits imposed by the infrastructure running your flow, such as Kubernetes pods, available CPU, memory, or database connections. It's important to plan your resources accordingly, and if necessary, institute controls such as concurrency limits to prevent overwhelming your system and ensuring smooth execution. For more information on task concurrency limits, please refer to the Tasks section of the Prefect documentation.