Gardner
03/25/2025, 7:53 AMMarvin
03/25/2025, 7:53 AM@task
decorator to define your task.
2. Concurrent Execution: Use the ThreadPoolTaskRunner
for concurrent task execution. This allows tasks to run concurrently in independent threads.
3. Parallel Execution: For truly parallel execution, consider using DaskTaskRunner
, which can execute tasks in parallel using Dask.
4. Execution Method: Use the .submit()
or .map()
methods to submit tasks for concurrent or parallel execution.
Here's a brief example of how you might set this up:
python
from prefect import task, flow
from prefect.task_runners import ThreadPoolTaskRunner
import pandas as pd
@task
def add_column(df: pd.DataFrame) -> pd.DataFrame:
df['new_column'] = df['existing_column'] * 2 # Example transformation
return df
@flow(task_runner=ThreadPoolTaskRunner())
def process_data(df: pd.DataFrame):
return add_column.submit(df)
# Example DataFrame
df = pd.DataFrame({'existing_column': [1, 2, 3]})
process_data(df)
- Documentation: For more details on task runners and concurrent execution, you can refer to the Prefect 3.x documentation on task runners.
This setup allows you to efficiently handle tabular data transformations in Prefect, leveraging parallel execution to speed up the process.Gardner
03/25/2025, 7:55 AMdescription
and we'd like to summarize it into a new column called description_summary
by calling an LLM. We want the LLM calls to happen in parallel or concurrently.Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.
Powered by