Jonas Hanfland

08/18/2020, 2:03 PM
Hey guys, I'm trying to implement some basic task dependencies, but some of my tasks appear multiple times in the graph that
generates. In the graph that's generated I would like
Add new columns
to be dependent on
Extract customers
, while
Extract customers
should itself dependent on
Unnest dassport data
. In the code, how do I add the
Unnest passport data
dependency to the existing
block without causing duplicates? Thx in advance

Jim Crist-Harif

08/18/2020, 2:16 PM
Hi Jonas, looks like you're running into the same issue in two places. When you call a task (like
), this creates a copy of the task, creating a new instance with the set dependencies. When you call
, this applies those methods to the existing instance. It looks like you probably want to call
once each first to create the task instances your flow will use, then call the `set_upstream`/`set_dependencies` methods on those instances rather than on the task functions themselves:
extract_customers_task = extract_customers()

Also, since you're using the functional api already, you might avoid calling `set_upstream`/`set_dependencies` manually at all and instead make use the
kwarg when calling the task originally. This might look like:
unnest_verifications_task = unnest_verifications()
extract_customers_task = extract_customers(upstream_tasks=[unnest_verifications_task])
There's no harm in using the methods manually, but many users find treating tasks like function calls to be a clearer way of marking dependencies between tasks.
Also, in the future if possible could you copy-paste code snippets into slack rather than a screenshot? It makes it easier to reply with code, rather than having to retype in all the stuff you took a picture of.

Jonas Hanfland

08/18/2020, 2:38 PM
Amazing. Thank you so much for your quick and detailed answer! The tips are also greatly appreciated and I will try to remember to paste my code next time.