Jonas Hanfland
08/18/2020, 2:03 PMflow.visualize() generates.
In the graph that's generated I would like Add new columns to be dependent on Extract customers , while Extract customers should itself dependent on Unnest dassport data.
In the code, how do I add the Unnest passport data dependency to the existing .set_dependencies() block without causing duplicates? Thx in advanceJim Crist-Harif
08/18/2020, 2:16 PMextract_customers() ), this creates a copy of the task, creating a new instance with the set dependencies. When you call set_upstream or set_dependencies , this applies those methods to the existing instance. It looks like you probably want to call unnesst_verifications and extract_customers once each first to create the task instances your flow will use, then call the `set_upstream`/`set_dependencies` methods on those instances rather than on the task functions themselves:
extract_customers_task = extract_customers()
extract_customers_task.set_upstream(...)
Also, since you're using the functional api already, you might avoid calling `set_upstream`/`set_dependencies` manually at all and instead make use the upstream_tasks kwarg when calling the task originally. This might look like:
unnest_verifications_task = unnest_verifications()
extract_customers_task = extract_customers(upstream_tasks=[unnest_verifications_task])
...
There's no harm in using the methods manually, but many users find treating tasks like function calls to be a clearer way of marking dependencies between tasks.Jim Crist-Harif
08/18/2020, 2:19 PMJonas Hanfland
08/18/2020, 2:38 PM