Is it possible to map a pandas DataFrame?
# prefect-community
m
Is it possible to map a pandas DataFrame?
z
Hi @Matthias! It looks like the answer is no for now-- DataFrames don't play nicely with how we index mapped objects.
m
hmm, alright. I could not find a nice way. my work around is to return a dict from the DataFrame, but that leads Dask to fail due to “Large object of size 4.33 MB”.
d
Really? How many rows / cols do you have? Are you using the
<http://pandas.DataFrame.to|pandas.DataFrame.to>_dict
function for your transformation ?
j
Dask can pass around large dataframes (and 4.33 MiB isn't that large). So you should be able to return a dataframe fine from a task.
What are you trying to accomplish though?
If you want to map across the rows, I might use
<http://df.to|df.to>_dict(orient='records')
or
<http://df.to|df.to>_records()
upvote 2
Copy code
In [15]: @task
    ...: def create():
    ...:     return pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}).to_records()
    ...:
    ...:

In [16]: @task
    ...: def transform(row):
    ...:     return row.a + row.b
    ...:

In [17]: with Flow("test") as flow:
    ...:     data = create()
    ...:     res = transform.map(data)
    ...:

In [18]: flow.run()
m
I will get back to this tomorrow. Basically I am doing exactly what you are showing here.
I am not able to reproduce the issue, and it is not at all related to Pandas. The error only happens, if I map data, reduce and transform the results and then map that result again.