< Marvin> run db sheet job and process results are both task Prefect Community #ask-marvin

<@ULVA73B9P> run_db_sheet_job() and process_result...

Jason Wiener

10/15/2024, 6:28 PM

@Marvin run_db_sheet_job() and process_results() are both tasks, used as below.

Copy code

# Run scheduled jobs
        results = run_db_sheets_job.with_options(persist_result=True, result_storage=result_persistence_location).map(scheduled)

        # Process the results, allowing Futures from mapped jobs to resolve
        to_notify = process_results(results=quote(results))

I want process_results() to wait for all results from run_db_sheets_job() but I see this in the log:

Copy code

12:14:21.520 | INFO    | Flow run 'cocky-degu' - Created task run 'run_db_sheets_job-0' for task 'run_db_sheets_job'
12:14:21.522 | INFO    | Flow run 'cocky-degu' - Submitted task run 'run_db_sheets_job-0' for execution.
12:14:21.825 | INFO    | Flow run 'cocky-degu' - Created task run 'process_results-0' for task 'process_results'
12:14:21.827 | INFO    | Flow run 'cocky-degu' - Executing 'process_results-0' immediately...

process_results() does not complete until all the mapped jobs in run_db_sheets_job() have an outcomes but I would like to understand this better. How do the internal states of results and to_notify evolve over the course of the flow both tasks are embedded in?

Marvin

10/15/2024, 6:28 PM

To understand how the internal states of

results

and

to_notify

evolve in a Prefect flow, we need to get a grasp on how Prefect handles tasks and task dependencies. From your code snippet and logs, it appears that

run_db_sheets_job

is a mapped task, and you want

process_results

to wait for all mapped tasks to complete. Here's how Prefect manages these details: 1. Initialization and Submission: -

run_db_sheets_job.map(scheduled)

triggers the execution of

run_db_sheets_job

for each element in

scheduled

. This creates a

MappedTask

object which handles execution for the number of elements mapped over. 2. Persisting Results: - By using

persist_result=True

, each task result is stored in

result_storage

specified by

result_persistence_location

. 3. Internal State Evolution: - As

run_db_sheets_job

tasks are executed, their results are captured as Future objects. The state of these objects evolves from Pending -> Running -> Completed or Failed, depending on their outcomes. 4. Process Results with Dependency Management: - When you pass

quote(results)

process_results

, you're effectively indicating that

process_results

should wait for the Future objects in

results

to complete before executing. - Prefect's task dependency management ensures

process_results

starts only when all tasks in

results

are completed. This is why you see it executing later, even though it logs its initiation early. 5. Understanding Logs: - The log messages show the creation and submission of tasks. Although

process_results

is created immediately, it doesn't execute (i.e., complete the task logic) until all mapped tasks in

run_db_sheets_job

have resolved. Let's confirm the Prefect version being used and check relevant documentation to ensure the behavior aligns with your specific version. It seems there was an issue when attempting to retrieve the latest Prefect version and the relevant documentation. Let's proceed by researching specific documentation regarding task mapping and dependencies in both Prefect versions. This will help ensure the behavior you're witnessing is expected and aligned with the Prefect version you're using. If you can provide the version number you're working with, it would also help tailor the solution. For now, let me perform a targeted search for these concepts. Based on my research, here's a detailed explanation regarding the evolution of the internal states of

results

and

to_notify

in your Prefect task flow: Prefect 2.x: 1. Mapped Tasks: Prefect uses the

.map()

method to handle execution over iterables. Each mapped task run is independent and has its own lifecycle, transitioning from Pending to Running to Completed or Failed. Prefect ensures that each task in the map waits for its specific upstream dependencies to complete before it runs. 2. Dependency Management: When you use

quote(results)

, Prefect waits for all the mapped tasks within

results

(these are Future objects) to complete before executing

process_results

. It's a built-in mechanism in Prefect's dependency management where mapped task dependencies are automatically tracked. 3. Task Execution Logs: Even though

process_results

might appear in the logs before all

run_db_sheets_job

tasks have finished, the actual execution logic (completion of

process_results

) waits internally until all these tasks resolve. Prefect 3.x: - The behavior remains consistent with 2.x in terms of how mapped tasks are handled, but there are improvements and better support for task runners and deferred execution. - Prefect 3.x documentation emphasizes the efficient use of both concurrent and sequential task execution through improved task runners and deferred tasks. - The concept of transactions in caching or ensuring certain tasks execute together as a single transaction strength to the execution consistency.

Marvin

10/15/2024, 6:28 PM

For your situation, it's crucial to understand that regardless of version,

quote(results)

ensures that

process_results

does not execute its main logic until all mapped tasks complete. The logs may show task creation, but dependency resolution ensures orderly execution. For detailed explanation and examples, you can refer to the Prefect 2.x and Prefect 3.x documentation. If you have further questions or need clarification on specific Prefect implementations, feel free to ask!

4 Views

Open in Slack

Previous Next