Sorry this might be a silly question but still coming from t Prefect Community #ask-community

Sorry this might be a silly question, but still co...

Feliks Krawczyk

08/29/2019, 1:57 AM

Sorry this might be a silly question, but still coming from the Airflow world. In your documentation:

Copy code

from prefect import task, Flow

@task
def create_cluster():
    cluster = create_spark_cluster()
    return cluster

@task
def run_spark_job(cluster):
    submit_job(cluster)

@task
def tear_down_cluster(cluster):
    tear_down(cluster)


with Flow("Spark") as flow:
    # define data dependencies
    cluster = create_cluster()
    submitted = run_spark_job(cluster)
    tear_down_cluster(cluster)

    # wait for the job to finish before tearing down the cluster
    tear_down_cluster.set_upstream(submitted)

Wouldn’t you also need to add

submitted.set_upstream(cluster)

to ensure this works? Or is there some hidden logic that means when you define a flow that it runs the lines in order, so

submitted

runs only after

cluster

? If that is the case, how do you get tasks to run in parallel?

Chris White

08/29/2019, 1:58 AM

no worries -> Prefect actually can infer your dependencies based on how you call things in the functional API. So in this case

submitted = run_spark_job(cluster)

set

cluster

as an upstream dependency of

submitted

automatically

Chris White

08/29/2019, 1:59 AM

it doesn’t run the lines in order, it runs your DAG in topological order, similar to Airflow -> so no downstream dependencies are allowed to start until all tasks which are upstream are finished

Chris White

08/29/2019, 2:01 AM

check out

flow.visualize()

on the flow above to see what it looks like and how the dependencies have been set

Feliks Krawczyk

08/29/2019, 2:02 AM

Oh gotcha! Sorry I totally missed the functional part of run_spark_job(

cluster

)

Feliks Krawczyk

08/29/2019, 2:02 AM

Makes sense 👍

Chris White

08/29/2019, 2:02 AM

yup no worries!

Feliks Krawczyk

08/29/2019, 2:02 AM

Thank you for clarifying!

Chris White

08/29/2019, 2:02 AM

anytime!

Open in Slack

Previous Next