Mateo Merlo

05/30/2022, 2:24 PM
Hey, I have a Secret AWS_CREDENTIALS in Prefect Cloud (1.0) following this format:
  "ACCESS_KEY": "abcdef",
  "SECRET_ACCESS_KEY": "ghijklmn"
If I'm using pandas to read a file in S3:
df = pd.read_csv(f"s3://{s3_bucket_name}/{filename}")
Should I need to pass the credentials as a param to the function read_csv? Or are they read automatically from the Cloud? Currently I'm getting this error: "botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden" Thanks!

Anna Geller

05/30/2022, 2:34 PM
Try creating a boto3 session in which you use those credentials

Mateo Merlo

05/30/2022, 2:54 PM
I will try it. Thanks!
👍 1

Volker L

06/06/2022, 12:03 PM
I highly recommend using pyarrow datasets, when working with parquet datasets/files in an AWS S3 bucket. Optionally, If you wanna run queries on your parquet dataset/files, you can use duckdb. Here is a short working example:
from pyarrow.fs import S3FileSystem
# or instead of pyarrow.fs
import s3fs
import pyarrow.dataset as ds
import duckdb

con = duckdb.connect()

fs = S3FileSystem(access_key="my_access_key", secret_key="my_secret_key", region="eu-1/frankfurt")
# or
fs = s3fs.S3FileSystem(anon=False, key="my_access_key", secret="my_secret_key")

history = ds.dataset("findata/forex_pros/history/D", partitioning=["exchange"], filesystem=fs)

aapl = con.execute("SELECT * FROM history WHERE symbol_id=6408").df()

Mateo Merlo

06/06/2022, 12:24 PM
Thanks @Volker L !!
🙂 1

Volker L

06/06/2022, 12:30 PM
You are welcome. Hope this short introduction is helpful. Contact me, if you need some more input.