My general pipeline flow is "fetch data" -> do ...
# prefect-community
c
My general pipeline flow is "fetch data" -> do various operations on data -> persist results. My first instinct was to split those stages up into separate (potentially mapped) tasks, but I'm worried about the overhead of passing large data blobs between tasks (especially if I move to running things on a cluster of some kind in future, where those tasks could end up running on different machines). Am I right to be worried about that? Is there a best practice?
1
b
Hey Christoper, I'm going to defer to Ryan's post about this since he provided some very useful information on passing large data objects between tasks with Prefect versions 2.6.0 and up. Please reach out here if you have any additional questions on best practices. 😄