Hey guys any working example of how to upload csv files from Prefect Community #ask-community

Hey guys any working example of how to upload csv ...

Hedgar

03/23/2022, 6:37 PM

Hey guys any working example of how to upload csv files from local machine to S3 using awswrangler? Of course prefect task

Anna Geller

03/23/2022, 7:01 PM

I don't think you need a Prefect task for that - it's just a single line of code 😄 here is how you can do this:

Copy code

import awswrangler as wr
import pandas as pd
from prefect import task, Flow


@task
def extract_data_to_df():
    return pd.DataFrame({"id": [1, 2], "name": ["foo", "bar"]})


@task
def load_to_s3(df):
    wr.s3.to_csv(df, "<s3://prefectdata/csv/file1.csv>", index=False)


with Flow("s3-csv-flow") as flow:
    dataframe = extract_data_to_df()
    load_to_s3(dataframe)

I'm using awswrangler a lot so if you have any questions about it, LMK

Hedgar

03/23/2022, 7:47 PM

@Anna Geller Oh great! recently discovered it and am like “dude what took you so long” Meanwhile from your code this would upload same file i.e file1.csv. What if I'm uploading diff files everyday and I want those files to bear the days date and time e.g

<tel:18-03-2022|18-03-2022>_18:13.csv, <tel:21-03-2022|21-03-2022>_19:13.csv

e.t.c what can I do different on my we.s3.to_csv() function?

Anna Geller

03/23/2022, 8:03 PM

use

datetime.utcnow()

as part of a file name? it's up to you how you structure your data

Hedgar

03/23/2022, 8:27 PM

Yes I know, I have done that but the date don't reflect the current day rather the date of the very first run

<tel:18-03-2022|18-03-2022>.csv

when I read the wr doc I saw something like make

dataset=True

done something similar?

Anna Geller

03/23/2022, 8:28 PM

can you share your code?

Anna Geller

03/23/2022, 8:29 PM

you need to put this into a task, otherwise the date will be frozen at flow registration

Hedgar

03/23/2022, 8:48 PM

@Anna Geller Yes you are right I had the variable outside the task when I register the flow but recently moved it into the upload to s3 task but nothing seem to change. Do I need to change the version of the flow and how can I do this?

Anna Geller

03/23/2022, 8:52 PM

you can do either

flow.register("project")

or via CLI:

Copy code

prefect register --project xxx -p yourflow.py

Hedgar

03/23/2022, 9:02 PM

@Anna Geller

flow.register(“project”)

is permanently the last line in my py file. The flow run on schedule upon

prefect agent local start

but it's still printing the frozen date file name but the data content is fresh

Anna Geller

03/23/2022, 9:04 PM

I can't say anything really without seeing the actual code

234 Views

Open in Slack

Previous Next