Jackson Maxfield Brown
06/15/2020, 6:07 PMids = [some_iterable_of_data_ids_to_retrieve]
with Flow() as flow:
big_mixed_result = get_data.map(ids)
filtered_result = filter_data(big_mixed_result)
processed = process_data.map(filtered_result)
I run the flow using a distributed.LocalCluster
/ DaskExecutor
and when it hits the process_data
task, I get the:
"UserWarning: Large object ... detected in task graph... consider using Client.scatter"
Prefect / Dask tries to continue on but fails and restarts the Flow after workers hit memory limit.
I guess what is confusing to me is that I am doing a map
operation on the task so I wouldn't expect any large object being transferred between workers. I would have assumed that the map
call only sends each small iteration but I guess that's not the case?
One idea I was considering was instead of using a List[Dict]
instead using a dask.bag
to split the memory across workers maybe? I don't really know, any and all ideas welcome.Kyle Moon-Wright
06/15/2020, 6:42 PMJim Crist-Harif
06/15/2020, 6:56 PMJackson Maxfield Brown
06/15/2020, 7:00 PM