Berty
06/18/2021, 6:39 PMclient.scatter
?
python3.8/site-packages/distributed/worker.py:3373: UserWarning: Large object of size 126.73 MB detected in task graph:
{'task': <Task: blah>, 'state': None, 'ups ... _parent': True}
Consider scattering large objects ahead of time
with client.scatter to reduce scheduler burden and
keep data on workers
future = client.submit(func, big_data) # bad
big_future = client.scatter(big_data) # good
future = client.submit(func, big_future) # good
warnings.warn(
Killed
python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 48 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Berty
06/18/2021, 6:48 PMdata = get_dataframe() # returns 250k rows
with flow('testing large set', executor=DaskExecutor()) as f:
my_simple_task.map(data.pk.values, data.more_data.values)
Berty
06/18/2021, 6:59 PMBerty
06/18/2021, 7:28 PMKevin Kho
Berty
06/18/2021, 8:47 PMBerty
06/18/2021, 8:49 PMmy_simple_task.map([250,000 values], [250,000 values])
this has the same issueKevin Kho
Kevin Kho
if __name__ == '__main__'
and retry?Berty
06/18/2021, 10:01 PMif __name__
block. The task just fires off an api request and save the response json to disk.Kevin Kho