Jonas Hanfland
08/18/2020, 2:03 PMflow.visualize()
generates.
In the graph that's generated I would like Add new columns
to be dependent on Extract customers
, while Extract customers
should itself dependent on Unnest dassport data
.
In the code, how do I add the Unnest passport data
dependency to the existing .set_dependencies()
block without causing duplicates? Thx in advanceJim Crist-Harif
08/18/2020, 2:16 PMextract_customers()
), this creates a copy of the task, creating a new instance with the set dependencies. When you call set_upstream
or set_dependencies
, this applies those methods to the existing instance. It looks like you probably want to call unnesst_verifications
and extract_customers
once each first to create the task instances your flow will use, then call the `set_upstream`/`set_dependencies` methods on those instances rather than on the task functions themselves:
extract_customers_task = extract_customers()
extract_customers_task.set_upstream(...)
Also, since you're using the functional api already, you might avoid calling `set_upstream`/`set_dependencies` manually at all and instead make use the upstream_tasks
kwarg when calling the task originally. This might look like:
unnest_verifications_task = unnest_verifications()
extract_customers_task = extract_customers(upstream_tasks=[unnest_verifications_task])
...
There's no harm in using the methods manually, but many users find treating tasks like function calls to be a clearer way of marking dependencies between tasks.Jonas Hanfland
08/18/2020, 2:38 PM