Hello again! I have an orchestrator flow that calls two other flows using run_deployment. The first flow returns a pandas dataframe that is persisted to an Azure block. What would be the most efficient way to pass the first flow's result to the second flow? I was looking at this but it looks like it would require a lot of serialization/deserialization
the main flow is something like this
Copy code
run_deployment("transformation_1", ...) # returns a df as result that is persisted to azure storage
run_deployment("transformation_2", ...) # needs the dataframe returned by `transformation_1` as input
should I manually save the df to storage in
transformation_1
and pass the url to
transformation_2
?
✅ 1
z
Zanie
12/29/2022, 9:35 PM
Since the flows are being run in different processes you’re going to need to serialize / deserialize the data.
Zanie
12/29/2022, 9:36 PM
Saving to storage and passing a reference is the same thing that happens when using result persistence and would still require serialization
p
Paco Ibañez
12/29/2022, 9:49 PM
wouldn't be an additional deserialization/serialization when calling
state.result()
in the orchestrator flow to then pass it to the second
run_deployment
?
z
Zanie
12/29/2022, 10:11 PM
Ah I see what you’re saying
Zanie
12/29/2022, 10:12 PM
You should be able to pass the state itself to the second deployment then retrieve the result from within it
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.