hey everyone I have a question about passing data between fl Prefect Community #ask-community

hey everyone, I have a question about passing data...

Steve s

09/23/2021, 1:03 PM

hey everyone, I have a question about passing data between flows. My scenario is that I need to pass a pandas DataFrame from one flow to another. Right now I'm converting the DataFrame to JSON and passing it in as a Parameter. It works well, but I've only tried for relatively small datasets. I'm wondering: what are the limits of scale for this approach, and should I be handling this differently?

emre

09/23/2021, 1:23 PM

Hey, I never tested this, but I don't think this is a good approach. You are effectively sending your serialized dataframe to prefect server, which (probably) shouldn't be burdened with big payloads. Instead, serialize your dataframe and upload it in a place accessible to both your flows. Like AWS S3 or some other file server. Then your first flow will pass the file id to the second flow. The second flow can download the file and read it into a dataframe, using the file id.

👍 1

Steve s

09/23/2021, 1:46 PM

Gotcha, that makes sense. Thanks very much for your help!

😊 1

Kevin Kho

09/23/2021, 1:58 PM

Hey @Steve s, Emre is right here. Basically the payloads for API calls have a limit and the Parameter is sent through the API. The limit is 5MB so it will throw an error for bigger dataframes.

Steve s

09/23/2021, 2:00 PM

I see, great to know that hard limit. That gives me a good idea of how long I can get away with this before revising. Much appreciated!

23 Views

Open in Slack

Previous Next