Scott Moreland
03/04/2021, 9:16 PM@cache(subsample=100, sdf_key='sdf_large')
@task
def some_large_spark_dataframe():
"intensive ETL process here"
...
return sdf
@task
def downstream_task(sdf_large):
"some intensive computation on sdf_large"
...
return sdf
with Flow() as flow:
sdf_large_sample = read_from_cache('sdf_large')
downstream_task(sdf_large_sample)
...but I've had difficulty stacking decorators with the task decorator. Moreover, the usual challenges of the task result not being available until evaluation time. Any recipes you'd recommend?Zanie
Task type and as long as this is on the inside of the task decorator you can ignore that it is deferred.Scott Moreland
03/04/2021, 11:35 PM