Gardner
03/25/2025, 7:53 AMMarvin
03/25/2025, 7:53 AM@task
decorator to define your task.
2. Concurrent Execution: Use the ThreadPoolTaskRunner
for concurrent task execution. This allows tasks to run concurrently in independent threads.
3. Parallel Execution: For truly parallel execution, consider using DaskTaskRunner
, which can execute tasks in parallel using Dask.
4. Execution Method: Use the .submit()
or .map()
methods to submit tasks for concurrent or parallel execution.
Here's a brief example of how you might set this up:
python
from prefect import task, flow
from prefect.task_runners import ThreadPoolTaskRunner
import pandas as pd
@task
def add_column(df: pd.DataFrame) -> pd.DataFrame:
df['new_column'] = df['existing_column'] * 2 # Example transformation
return df
@flow(task_runner=ThreadPoolTaskRunner())
def process_data(df: pd.DataFrame):
return add_column.submit(df)
# Example DataFrame
df = pd.DataFrame({'existing_column': [1, 2, 3]})
process_data(df)
- Documentation: For more details on task runners and concurrent execution, you can refer to the Prefect 3.x documentation on task runners.
This setup allows you to efficiently handle tabular data transformations in Prefect, leveraging parallel execution to speed up the process.Gardner
03/25/2025, 7:55 AMdescription
and we'd like to summarize it into a new column called description_summary
by calling an LLM. We want the LLM calls to happen in parallel or concurrently.