Hi, any suggestions on how to pass a dataframe as ...
# prefect-community
c
Hi, any suggestions on how to pass a dataframe as a parameter to a flow? This breaks in latest Prefect:
Copy code
from prefect import flow
import pandas as pd
import numpy as np


@flow
def do_nothing(df):
    return None


if __name__ == "__main__":
    df = pd.DataFrame(np.random.random((10_000, 20)))
    do_nothing(df)
Traceback in thread
Copy code
Traceback (most recent call last):
  File "/test_big_df.py", line 13, in <module>
    do_nothing(df)
  File "/lib/python3.9/site-packages/prefect/flows.py", line 384, in __call__
    return enter_flow_run_engine_from_flow_call(
  File "/lib/python3.9/site-packages/prefect/engine.py", line 158, in enter_flow_run_engine_from_flow_call
    return anyio.run(begin_run)
  File "/lib/python3.9/site-packages/anyio/_core/_eventloop.py", line 70, in run
    return asynclib.run(func, *args, **backend_options)
  File "/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 292, in run
    return native_run(wrapper(), debug=debug)
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
    return await func(*args)
  File "/lib/python3.9/site-packages/prefect/client.py", line 103, in with_injected_client
    return await fn(*args, **kwargs)
  File "/lib/python3.9/site-packages/prefect/engine.py", line 215, in create_then_begin_flow_run
    flow_run = await client.create_flow_run(
  File "/lib/python3.9/site-packages/prefect/client.py", line 609, in create_flow_run
    flow_run_create_json = flow_run_create.dict(json_compatible=True)
  File "/lib/python3.9/site-packages/prefect/orion/utilities/schemas.py", line 257, in dict
    return json.loads(self.json(*args, **kwargs))
  File "/lib/python3.9/site-packages/prefect/orion/utilities/schemas.py", line 227, in json
    return super().json(*args, **kwargs)
  File "pydantic/main.py", line 505, in pydantic.main.BaseModel.json
  File "/lib/python3.9/site-packages/prefect/orion/utilities/schemas.py", line 123, in orjson_dumps
    return orjson.dumps(v, default=default).decode()
TypeError: Dict key must be str
Work around would be something like this, but would rather just pass a DF directly:
Copy code
from prefect import flow
import pandas as pd
import numpy as np


@flow
def do_nothing(values, index, columns):
    return None


if __name__ == "__main__":
    df = pd.DataFrame(np.random.random((10_000, 20)))
    do_nothing(df.values, df.index, df.columns)
1
a
you can read a dataframe in one task and pass it as dependency to another task
passing this as a parameter won't work since parameters must be JSON serializable - alternative, pass a reference to your file or S3 object
c
I want to use subflows
Works better for my use case
What's the actual reason for parameters to be JSON serializable?
a
Deployments and parametrization by using API calls
e
I have the same issue. The error message could be more helpful, “Parameters for Flows must be json serializable, please check your parameters, see blah for documentation”
@Ching How did you adjust your df to make it parameterizable?
a
What error did you get exactly?
e
It took me awhile to try a few dif formats.
<http://df.to|df.to>_json()
didn’t work. I finally got
df.tocsv()
to work. And then to recreate
df=pd.read_csv(StringIO(csv_str))
The error was
TypeError: Dict key must be str
I got it when I tried to pass the df, and I also got it when I tried to pass
<http://df.to|df.to>_json()
which is weird
oh scratch that, to_json did work, i just tried it again
a
looks like a pandas-specific thing; thanks for sharing more info. I added this issue https://github.com/PrefectHQ/prefect/issues/6962
e
Thanks Anna! I added a little script to recreate
🙏 1
👍 1