Ohhhhh so close I was working with the new `target` paramete Prefect Community #ask-community

Ohhhhh so close. I was working with the new `tar...

Nate Atkins

05/17/2020, 1:22 AM

Ohhhhh so close. I was working with the new

target

parameter on tasks. This in conjunction with the addition of the upstream

edges

on the trigger signature almost got me the ability to rebuild the current task target if the upstream was updated. I did a little trickery if the current task target needs to be rebuilt, I delete the target cache file and then raise a RETRY signal. When the task retries it can't find the cache file runs the task. The only problem I have is that if the upstream task didn't run and update, and the current task doesn't need to run - what do I raise/return from the trigger to get the task to use the cached results? True: The task will run False: The flow will fail SUCCESS: No cached results to pass on to the next task.

workflow.py

👀 2

👍 2

Chris White

05/17/2020, 1:28 AM

Hey Nate! Thanks for sharing — this seems like a cool pattern! Two things: - if you use the downstream task’s

result

attribute you can simplify your code with the

exists

interface; moreover, if you find

exists

True

, you can use the

result

read()

from the target location and gather the data. - finally, all Prefect Signals are capable of carrying result information; so in this case, you might do (extreme psuedocode):

Copy code

for edge, state in upstream_states.items():
    data = state._result.read(target_location)
...
raise SUCCESS(result=data)

Nate Atkins

05/17/2020, 1:59 PM

Thanks Chris, Modified the return at the end to use the cached data.

Copy code

# Use the cached file. 
with downstream_target.open("rb") as fp:
    data = cloudpickle.load(fp)
raise SUCCESS(result=data)

It would have taken my quite a while to find that solution.

target

really simplified this. Great addition. Now I think I have the

make

semantics I've been searching for over the last few weeks. Now to try it out in some real flows.

🚀 1

Nate Atkins

05/17/2020, 2:08 PM

event better.

Copy code

# Use the cached file.
result = edge.downstream_task.result
data = result.read(result.location)
raise SUCCESS(result=data)

💯 1

Brad

05/17/2020, 9:36 PM

Hey @Nate Atkins would you mind sharing your final implementation?

Nate Atkins

05/17/2020, 11:59 PM

Phew, that's been a productive couple of days where I learned a lot about how Prefect is working. The

trigger

leverages an extension I made to the

Result

to make results comparable. My

LocalJsonResult

implements

__lt__

so the trigger can easily see if a cache is out of date and know that it needs to run the task that builds it. The outstanding piece I have is if a task generates multiple targets. I don't see anything that keeps me from passing a list of filenames as

target

as long as my

trigger

and

result

are smart enough to deal with a list of files. Let me know what you think about this. I still keep thinking that I'm somewhere in the weeds on my approach, but remember that data science workflows tend to me more incremental than an ETL workflow. I at least feel better that I've gotten all the weird negative engineering stuff out in just a couple of dozen lines of code int the

trigger

and

result

. It was all a giant mess in my tasks or a bunch of extra tasks.

workflow.py dependency_triggers.py local_json_result.py

4 Views

Open in Slack

Previous Next