quick q on mapped task inputs < Jim Crist Harif> mentioned

quick q on mapped task inputs (<@U011EKN35PT> ment...

Brett Naul

07/07/2020, 9:19 PM

quick q on mapped task inputs (@Jim Crist-Harif mentioned here that this might that this might get resolved in the mapping refactor but we're still seeing it) is passing

task.map(sequence, x=unmapped(some_large_result))

a bad idea? we're seeing

UserWarning: Large object of size 874.55 MB detected in task graph

and very very slow creation of mapped tasks; I would have expected that these would be re-using the dask futures already present on the cluster, not rehydrating the outputs inside the

FlowRunner

loop, but I guess that's not how it works. is this expected behavior and if so is there any known workaround..?

Jim Crist-Harif

07/07/2020, 9:20 PM

Yeah, as written this will be inefficient if

some_large_result

is not the result of a task (e.g. if it's a constant). If it is the result of a task, we should be able to make the efficient, but the flow runner might be doing something dumb.

Brett Naul

07/07/2020, 9:24 PM

thanks @Jim Crist-Harif! in our case it is another upstream task, not a constant. I'm not exactly sure where the problematic behavior is but it does seem like the flow runner is

.submit

ting the actual result bytes for every task

Jim Crist-Harif

07/07/2020, 9:26 PM

I wouldn't be surprised. There's lots of low-hanging fruit in the flow runner pipeline in terms of passing around completed results. Since prefect doesn't do any high-level graph heuristics (it just walks a topological sort), it doesn't distinguish between tasks whose results are still needed and tasks whose results can be dropped. I can create an issue for this specific behavior if you don't want to. I suspect we'll need to reconfigure how we walk the graph and submit tasks to avoid reserializing the results around.

Brett Naul

07/07/2020, 9:27 PM

if you don't mind that would be great, I expect your description would be quite a bit more helpful/accurate 🙂

👍 1

Jim Crist-Harif

07/07/2020, 9:52 PM

https://github.com/PrefectHQ/prefect/issues/2927

Jim Crist-Harif

07/07/2020, 9:52 PM

Thinking more, I think this wouldn't be that hard to do. Might hack on this in the next week or so.

Jim Crist-Harif

07/07/2020, 9:52 PM

Thanks for the issue report! Would be good to get this fixed.

🎉 1

Open in Slack

Previous Next