Hey guys any working example of how to upload csv files from local machine to S3 using awswrangler? Of course prefect task
a
Anna Geller
03/23/2022, 7:01 PM
I don't think you need a Prefect task for that - it's just a single line of code 😄 here is how you can do this:
Copy code
import awswrangler as wr
import pandas as pd
from prefect import task, Flow
@task
def extract_data_to_df():
return pd.DataFrame({"id": [1, 2], "name": ["foo", "bar"]})
@task
def load_to_s3(df):
wr.s3.to_csv(df, "<s3://prefectdata/csv/file1.csv>", index=False)
with Flow("s3-csv-flow") as flow:
dataframe = extract_data_to_df()
load_to_s3(dataframe)
I'm using awswrangler a lot so if you have any questions about it, LMK
h
Hedgar
03/23/2022, 7:47 PM
@Anna Geller Oh great! recently discovered it and am like “dude what took you so long” Meanwhile from your code this would upload same file i.e file1.csv. What if I'm uploading diff files everyday and I want those files to bear the days date and time e.g
e.t.c what can I do different on my we.s3.to_csv() function?
a
Anna Geller
03/23/2022, 8:03 PM
use
datetime.utcnow()
as part of a file name? it's up to you how you structure your data
h
Hedgar
03/23/2022, 8:27 PM
Yes I know, I have done that but the date don't reflect the current day rather the date of the very first run
<tel:18-03-2022|18-03-2022>.csv
when I read the wr doc I saw something like make
dataset=True
done something similar?
a
Anna Geller
03/23/2022, 8:28 PM
can you share your code?
Anna Geller
03/23/2022, 8:29 PM
you need to put this into a task, otherwise the date will be frozen at flow registration
h
Hedgar
03/23/2022, 8:48 PM
@Anna Geller Yes you are right I had the variable outside the task when I register the flow but recently moved it into the upload to s3 task but nothing seem to change. Do I need to change the version of the flow and how can I do this?
a
Anna Geller
03/23/2022, 8:52 PM
you can do either
flow.register("project")
or via CLI:
Copy code
prefect register --project xxx -p yourflow.py
h
Hedgar
03/23/2022, 9:02 PM
@Anna Geller
flow.register(“project”)
is permanently the last line in my py file. The flow run on schedule upon
prefect agent local start
but it's still printing the frozen date file name but the data content is fresh
a
Anna Geller
03/23/2022, 9:04 PM
I can't say anything really without seeing the actual code
Bring your towel and join one of the fastest growing data communities. Welcome to our second-generation open source orchestration platform, a completely rethought approach to dataflow automation.