quick q on mapped task inputs (<@U011EKN35PT> ment...
# prefect-community
b
quick q on mapped task inputs (@Jim Crist-Harif mentioned here that this might that this might get resolved in the mapping refactor but we're still seeing it) is passing
task.map(sequence, x=unmapped(some_large_result))
a bad idea? we're seeing
UserWarning: Large object of size 874.55 MB detected in task graph
and very very slow creation of mapped tasks; I would have expected that these would be re-using the dask futures already present on the cluster, not rehydrating the outputs inside the
FlowRunner
loop, but I guess that's not how it works. is this expected behavior and if so is there any known workaround..?
j
Yeah, as written this will be inefficient if
some_large_result
is not the result of a task (e.g. if it's a constant). If it is the result of a task, we should be able to make the efficient, but the flow runner might be doing something dumb.
b
thanks @Jim Crist-Harif! in our case it is another upstream task, not a constant. I'm not exactly sure where the problematic behavior is but it does seem like the flow runner is
.submit
ting the actual result bytes for every task
j
I wouldn't be surprised. There's lots of low-hanging fruit in the flow runner pipeline in terms of passing around completed results. Since prefect doesn't do any high-level graph heuristics (it just walks a topological sort), it doesn't distinguish between tasks whose results are still needed and tasks whose results can be dropped. I can create an issue for this specific behavior if you don't want to. I suspect we'll need to reconfigure how we walk the graph and submit tasks to avoid reserializing the results around.
b
if you don't mind that would be great, I expect your description would be quite a bit more helpful/accurate 🙂
👍 1
j
Thinking more, I think this wouldn't be that hard to do. Might hack on this in the next week or so.
Thanks for the issue report! Would be good to get this fixed.
🎉 1