Hedgar
03/23/2022, 6:37 PMAnna Geller
import awswrangler as wr
import pandas as pd
from prefect import task, Flow
@task
def extract_data_to_df():
return pd.DataFrame({"id": [1, 2], "name": ["foo", "bar"]})
@task
def load_to_s3(df):
wr.s3.to_csv(df, "<s3://prefectdata/csv/file1.csv>", index=False)
with Flow("s3-csv-flow") as flow:
dataframe = extract_data_to_df()
load_to_s3(dataframe)
I'm using awswrangler a lot so if you have any questions about it, LMKHedgar
03/23/2022, 7:47 PM<tel:18-03-2022|18-03-2022>_18:13.csv, <tel:21-03-2022|21-03-2022>_19:13.csv
e.t.c what can I do different on my we.s3.to_csv() function?Anna Geller
datetime.utcnow()
as part of a file name? it's up to you how you structure your dataHedgar
03/23/2022, 8:27 PM<tel:18-03-2022|18-03-2022>.csv
when I read the wr doc I saw something like make dataset=True
done something similar?Anna Geller
Anna Geller
Hedgar
03/23/2022, 8:48 PMAnna Geller
flow.register("project")
or via CLI:
prefect register --project xxx -p yourflow.py
Hedgar
03/23/2022, 9:02 PMflow.register(“project”)
is permanently the last line in my py file. The flow run on schedule upon prefect agent local start
but it's still printing the frozen date file name but the data content is freshAnna Geller