Hi Prefect community! Does anybody have any examples of
big_future = client.scatter(big_dask_dataframe)
and passing
future = client.submit(func, big_future)
as an output from one task to be used as an input in another task? I found this UserWarning at the bottom of the "prefect-etl" article in the dask docs (https://examples.dask.org/applications/prefect-etl.html) as well.
Was wondering if anybody has encountered this issue as well? And whether there's a solution to this. Thank you in advance!
z
Zanie
05/25/2021, 5:43 PM
I'm not sure passing futures around like this will work, I'd be curious to hear how it goes.
Thanks for the response Michael. I am working on an ETL pipeline with Dask Dataframes. I wanted to persist the DataFrame after reading a parquet file from S3, scatter the DataFrame into memory across the workers, then pass the future to downstream ETL tasks.
Chris L.
05/28/2021, 5:44 AM
But futures (to my knowledge) arent serialisable? So I’ll stick with calling dask.compute once inside the very last ETL task in the pipeline.
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.