Hello again! I have an orchestrator flow that calls two other flows using run_deployment. The first ...

Paco Ibañez

12/29/2022, 8:18 PM

Hello again! I have an orchestrator flow that calls two other flows using run_deployment. The first flow returns a pandas dataframe that is persisted to an Azure block. What would be the most efficient way to pass the first flow's result to the second flow? I was looking at this but it looks like it would require a lot of serialization/deserialization the main flow is something like this

Copy code

run_deployment("transformation_1", ...) # returns a df as result that is persisted to azure storage
run_deployment("transformation_2", ...) # needs the dataframe returned by `transformation_1` as input

should I manually save the df to storage in

transformation_1

and pass the url to

transformation_2

✅ 1

Zanie

12/29/2022, 9:35 PM

Since the flows are being run in different processes you’re going to need to serialize / deserialize the data.

Zanie

12/29/2022, 9:36 PM

Saving to storage and passing a reference is the same thing that happens when using result persistence and would still require serialization

Paco Ibañez

12/29/2022, 9:49 PM

wouldn't be an additional deserialization/serialization when calling

state.result()

in the orchestrator flow to then pass it to the second

run_deployment

Zanie

12/29/2022, 10:11 PM

Ah I see what you’re saying

Zanie

12/29/2022, 10:12 PM

You should be able to pass the state itself to the second deployment then retrieve the result from within it

Paco Ibañez

12/30/2022, 6:53 PM

thanks!

Open in Slack

Previous Next

Prefect Community

Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.