Hey I’m passing a fairly large object from one tas...
# prefect-community
p
Hey I’m passing a fairly large object from one task and then map a following task over an iterable with that big object being unmapped.
Copy code
Consider scattering large objects ahead of time
with client.scatter to reduce scheduler burden and 
keep data on workers

    future = client.submit(func, big_data)    # bad

    big_future = client.scatter(big_data)     # good
    future = client.submit(func, big_future)  # good
I was wondering, what would be the prefect pattern here to scatter the object ahead of time?
k
Even if it worked, it’s not best practice since your client/scheduler needs to upload this large object to each of your workers. It would be better if this object were somewhere like S3 and the workers i independently pulled it from there. Also, scatter specifically fails with autoscaling
p
Thanks for the reply.
Would I then have step in the task that would pull the object from a bucket?
k
Yes. That way, it’s the worker doing the work