Hey all, I was wondering if anyone has run into th...
# prefect-community
j
Hey all, I was wondering if anyone has run into the following error while trying to run a prefect flow (version 1) using the pandas package
Segmentation fault (core dumped)
. I have run this using different version of pandas (1.5.1 and 1.3.5) and still get the same error. I am able to import pandas and run a command such as
pd.show_versions()
without issues but whenever I try to run anything that would create a dataframe I get the segfault error. Including my currently installed versions and the flow I am trying to run below. Thanks in advance!
1
Copy code
INSTALLED VERSIONS
------------------
commit : 66e3805b8cabe977f40c05259cc3fcf7ead5687d
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 4.14.294-220.533.amzn2.x86_64
Version : #1 SMP Thu Sep 29 01:01:23 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8
pandas : 1.3.5
numpy : 1.23.4
pytz : 2022.6
dateutil : 2.8.2
pip : 20.0.2
setuptools : 45.2.0
Cython : None
pytest : 7.2.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : 2.9.5 (dt dec pq3 ext lo64)
jinja2 : 3.1.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 2022.11.0
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 10.0.0
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
xlwt : None
numba : None
Copy code
import os
import prefect
import sys
import pandas as pd
from prefect import task, Flow
from prefect.storage import S3
from prefect.run_configs import ECSRun
from prefect.run_configs import LocalRun

# create logger
logger = prefect.utilities.logging.get_logger()

@task
def say_hello():
    print(pd.show_versions())
    data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
    df = pd.DataFrame.from_dict(data)
    <http://logger.info|logger.info>(df.head())
    <http://logger.info|logger.info>("Hello, Cloud!")



flow = Flow("jrose_flow", tasks=[say_hello])

kwargs = {}
kwargs["cluster"] = f"arn:aws:ecs:us-west-2:XXXXXXXXXXX:cluster/prefect-agent-dev"

flow.run_config = ECSRun(task_role_arn="arn:aws:iam::XXXXXXXXXXX:role/prefect-dev-rpt-services-role",
                         execution_role_arn="arn:aws:iam::XXXXXXXXXXX:role/prefect-dev-rpt-services-role",
                         task_definition_arn="arn:aws:ecs:us-west-2:XXXXXXXXXXX:task-definition/rn-rpt-services-dev",
                         run_task_kwargs = kwargs)

flow.storage = S3(bucket="bucket_name", key="prefecttest/rpt/jrose_flow.py",
                  stored_as_script=False)
b
Hi Jeff! I had to do a bit of poking around the internet to see where this error could come from. It didn't strike me as a prefect-related error at first. So far what i've been able to gather is the
Segmentation fault
error arises when your system is trying to access memory that it does not have access to, or memory that doesn't exist.
I was reading through this webpage, maybe there is something here that could help?
j
Thanks! I think this might be due to the pandas version I have been using
1
It appears to be working when I switch to a different version
b
Sweet! Thanks for sharing that here. If someone else runs into the same issue they'll certainly appreciate this thread.
j
Just an FYI, I switched from pandas 1.5.1 -> 1.3.5 and that fixed it (while using Python 3.8.10)