https://prefect.io logo
p

Philipp Eisen

01/19/2022, 12:02 AM
Hey I’m passing a fairly large object from one task and then map a following task over an iterable with that big object being unmapped.
Copy code
Consider scattering large objects ahead of time
with client.scatter to reduce scheduler burden and 
keep data on workers

    future = client.submit(func, big_data)    # bad

    big_future = client.scatter(big_data)     # good
    future = client.submit(func, big_future)  # good
I was wondering, what would be the prefect pattern here to scatter the object ahead of time?
k

Kevin Kho

01/19/2022, 12:30 AM
Even if it worked, it’s not best practice since your client/scheduler needs to upload this large object to each of your workers. It would be better if this object were somewhere like S3 and the workers i independently pulled it from there. Also, scatter specifically fails with autoscaling
p

Philipp Eisen

01/19/2022, 9:36 AM
Thanks for the reply.
Would I then have step in the task that would pull the object from a bucket?
k

Kevin Kho

01/19/2022, 2:34 PM
Yes. That way, it’s the worker doing the work
2 Views