Thanks for all the help so far. Got another question! 😄 Hopefully I can explain this okay …
Using the feature_engineering.py example as a template. In the flow, there are a number of functions (not tasks) and a class is instantiated too. The last call is a task, but it’s not being attached after the class instantiation.
I’m expecting the following three to run in order. Not sure that
upstream_task
will fix this because the
load(clean)
task needs to run after the
DataFrame()
bit.
Copy code
clean = impute.map(data_cols, replacement_dict=unmapped({np.nan: 0})) # task
clean = DataFrame(clean, column_names) # function, not a task
load(clean) # task
Carl
02/25/2021, 10:18 AM
Here’s what the flow looks like, and the red shows how I want it to work :)
a
ale
02/25/2021, 10:43 AM
I think the problem is that the first
clean
is the result of a Task, while the second is not.
Since
load
depends on
clean
which is not a task result after the second initialization, that’s why you don’t get a dependency between
impute
and
load
c
Carl
02/25/2021, 10:49 AM
Yes, that’s correct. However,
DataFrame
is a custom Task class, so it should be picked up yeah?
Copy code
class DataFrame:
"""A utility class to provide convenient syntax for grabbing columns as a Task."""
def __init__(self, cols: "Task", colnames: "Task"):
self.cols = cols
self.colnames = colnames
def __getitem__(self, key: str):
return get(self.cols, self.colnames, key)
a
ale
02/25/2021, 10:55 AM
If you want DataFrame to be a custom Task, then you have to extend
Task
I guess
ale
02/25/2021, 10:56 AM
Otherwise Prefect does not know that DataFrame is a task
c
Carl
02/25/2021, 11:03 AM
Hmmm, that makes sense. My question then how is it working in the example (line 274)?
a
Amanda Wee
02/25/2021, 11:23 AM
`DataFrame`is not a task, but its
__getitem__
method means that say,
clean["H"]
is a task
c
Carl
02/25/2021, 11:32 AM
@Amanda Wee - ahh that’s starting to make sense now. Thank you for pointing that out. Sorry for all my basic questions 🙂
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.