KG
03/22/2024, 5:51 PMdef flow():
_ = taskA() #create table
taskB() #insert into table
If taskA requires taskB to complete first, should taskB use wait_for=[_]
Flows defined without the wait_for appear to be completing the tasks sequently anyway. Does the flow know to complete taskA before taskB regardless?Marvin
03/22/2024, 5:52 PMwait_for
argument can be useful.
In your example, if taskA
is running some sort of setup (e.g., creating a table) that taskB
relies on (e.g., inserting into the table), but taskB
doesn't need the output of taskA
, you could use wait_for
to explicitly define this dependency:
python
def flow():
task_a = taskA() # create table
taskB(wait_for=[task_a]) # insert into table
By specifying wait_for=[task_a]
in taskB
, you are making it clear to the Prefect scheduler that taskB
should not commence until taskA
has completed. This is particularly important if you have a flow that might run tasks in parallel — without wait_for
, Prefect may not inherently know that taskB
must wait for taskA
to finish.
Flows that appear to complete tasks sequentially may be doing so because the tasks do not have any parallel execution configured, or because of the order they are called in the flow definition. However, it is better practice to explicitly state dependencies when the execution order is crucial, to ensure the correct task execution order regardless of the flow configuration.
For more details and examples on task dependencies in flows, you can refer to the Prefect documentation and the basic flow example provided.KG
03/22/2024, 5:54 PMKevin Grismore
03/22/2024, 5:55 PMKG
03/22/2024, 5:55 PMKevin Grismore
03/22/2024, 5:56 PMwait_for
is useful, should you ever need it