Scott Moreland
03/04/2021, 9:16 PM@cache(subsample=100, sdf_key='sdf_large')
@task
def some_large_spark_dataframe():
"intensive ETL process here"
...
return sdf
@task
def downstream_task(sdf_large):
"some intensive computation on sdf_large"
...
return sdf
with Flow() as flow:
sdf_large_sample = read_from_cache('sdf_large')
downstream_task(sdf_large_sample)
...but I've had difficulty stacking decorators with the task
decorator. Moreover, the usual challenges of the task result not being available until evaluation time. Any recipes you'd recommend?Zanie
Task
type and as long as this is on the inside of the task
decorator you can ignore that it is deferred.Scott Moreland
03/04/2021, 11:35 PM