https://prefect.io logo
Title
r

Rui Loureiro

10/03/2019, 6:16 PM
Hey all, is there any way for a Task to access the result of another Task (in the same flow) aside from passing the upstream task to the downstream task's kwargs?
c

Chris White

10/03/2019, 6:19 PM
Hi @Rui Loureiro no there is not - this would introduce a dependency that Prefect doesn’t know anything about / isn’t able to track or enforce
r

Rui Loureiro

10/03/2019, 6:21 PM
Ok, that does make sense. Thanks for the ultra fast response!
c

Chris White

10/03/2019, 6:21 PM
anytime!
j

josh

10/03/2019, 6:24 PM
Hey @Rui Loureiro I’m interested in your use case on a need to retrieve the result from another Task without passing it in to downstream tasks!
r

Rui Loureiro

10/04/2019, 11:03 AM
Hey @josh. We have an internal tool that computes statistics of a dataframe. We currently use Dask’s DataFrames API and build on top of it, creating our own operations. Our two major objectives are: 1. Creating efficient computation graphs 2. Caching operations, to avoid recomputing time-expensive operations After struggling with implementing dependencies and caching in Dask, we are looking into Prefect. Consider the following example:
def specific_df_operation(df):
    foo = []
    for col in df.cols:
        foo.append(specific_col_operation(col))

    # do something with foo
specific_df_operation
depends on computing
specific_col_operation
for every column in the dataframe. However, it is not really feasible to have all these dependencies as argument to
specific_df_operation
. I'm not sure if Prefect has a way to accomplish this.
d

Dylan

10/04/2019, 1:45 PM
Hi Rui, check out the Prefect Docs on Persistence and Caching: https://docs.prefect.io/core/concepts/persistence.html
For caching expensive operations