https://prefect.io logo
Title
p

Paco Ibañez

12/29/2022, 8:18 PM
Hello again! I have an orchestrator flow that calls two other flows using run_deployment. The first flow returns a pandas dataframe that is persisted to an Azure block. What would be the most efficient way to pass the first flow's result to the second flow? I was looking at this but it looks like it would require a lot of serialization/deserialization the main flow is something like this
run_deployment("transformation_1", ...) # returns a df as result that is persisted to azure storage
run_deployment("transformation_2", ...) # needs the dataframe returned by `transformation_1` as input
should I manually save the df to storage in
transformation_1
and pass the url to
transformation_2
?
1
z

Zanie

12/29/2022, 9:35 PM
Since the flows are being run in different processes you’re going to need to serialize / deserialize the data.
Saving to storage and passing a reference is the same thing that happens when using result persistence and would still require serialization
p

Paco Ibañez

12/29/2022, 9:49 PM
wouldn't be an additional deserialization/serialization when calling
state.result()
in the orchestrator flow to then pass it to the second
run_deployment
?
z

Zanie

12/29/2022, 10:11 PM
Ah I see what you’re saying
You should be able to pass the state itself to the second deployment then retrieve the result from within it
p

Paco Ibañez

12/30/2022, 6:53 PM
thanks!