https://prefect.io logo
Title
h

Hedgar

03/23/2022, 6:37 PM
Hey guys any working example of how to upload csv files from local machine to S3 using awswrangler? Of course prefect task
a

Anna Geller

03/23/2022, 7:01 PM
I don't think you need a Prefect task for that - it's just a single line of code 😄 here is how you can do this:
import awswrangler as wr
import pandas as pd
from prefect import task, Flow


@task
def extract_data_to_df():
    return pd.DataFrame({"id": [1, 2], "name": ["foo", "bar"]})


@task
def load_to_s3(df):
    wr.s3.to_csv(df, "<s3://prefectdata/csv/file1.csv>", index=False)


with Flow("s3-csv-flow") as flow:
    dataframe = extract_data_to_df()
    load_to_s3(dataframe)
I'm using awswrangler a lot so if you have any questions about it, LMK
h

Hedgar

03/23/2022, 7:47 PM
@Anna Geller Oh great! recently discovered it and am like “dude what took you so long” Meanwhile from your code this would upload same file i.e file1.csv. What if I'm uploading diff files everyday and I want those files to bear the days date and time e.g
<tel:18-03-2022|18-03-2022>_18:13.csv, <tel:21-03-2022|21-03-2022>_19:13.csv
e.t.c what can I do different on my we.s3.to_csv() function?
a

Anna Geller

03/23/2022, 8:03 PM
use
datetime.utcnow()
as part of a file name? it's up to you how you structure your data
h

Hedgar

03/23/2022, 8:27 PM
Yes I know, I have done that but the date don't reflect the current day rather the date of the very first run
<tel:18-03-2022|18-03-2022>.csv
when I read the wr doc I saw something like make
dataset=True
done something similar?
a

Anna Geller

03/23/2022, 8:28 PM
can you share your code?
you need to put this into a task, otherwise the date will be frozen at flow registration
h

Hedgar

03/23/2022, 8:48 PM
@Anna Geller Yes you are right I had the variable outside the task when I register the flow but recently moved it into the upload to s3 task but nothing seem to change. Do I need to change the version of the flow and how can I do this?
a

Anna Geller

03/23/2022, 8:52 PM
you can do either
flow.register("project")
or via CLI:
prefect register --project xxx -p yourflow.py
h

Hedgar

03/23/2022, 9:02 PM
@Anna Geller
flow.register(“project”)
is permanently the last line in my py file. The flow run on schedule upon
prefect agent local start
but it's still printing the frozen date file name but the data content is fresh
a

Anna Geller

03/23/2022, 9:04 PM
I can't say anything really without seeing the actual code