Anyone know how to resolve this error s that prevents my flo Prefect Community #ask-community

Anyone know how to resolve this error (s)that prev...

Jason Motley

11/18/2021, 12:22 AM

Anyone know how to resolve this error (s)that prevents my flow from running?

- cloudpickle: (flow built with '1.6.0', currently running with '2.0.0')\n  - prefect: (flow built with '0.15.6', currently running with '0.15.9')\n  - python: (flow built with '3.8.8', currently running with '3.7.11')")

Kevin Kho

11/18/2021, 12:27 AM

In general, the agent versions for Python, prefect, and cloudpickle need to be aligned with the versions you registered for Flow in. This is because the flow is serialized and the deserialization needs to match the serialization.

Jason Motley

11/18/2021, 12:27 AM

Just fixed it by changing the flow name (go figure)... onto another quick question

Jason Motley

11/18/2021, 12:28 AM

TypeError: can't pickle SSLContext objects

Kevin Kho

11/18/2021, 12:32 AM

The Prefect

@task

has

checkpoint=True

enabled by default so that when your Flow fails, it can be restarted from failure. This

checkpoint=True

serializes the return if your task. I think you will get this error if you return some sort of connection or client. For that specific task, you can turn off checkpointing with

checkpoint=False

Jason Motley

11/18/2021, 12:32 AM

Similar to this?

@task(log_stdout=True, checkpoint=False)

Kevin Kho

11/18/2021, 12:33 AM

Yes exactly

Jason Motley

11/18/2021, 12:34 AM

Well, I got a little further to a different error 🙂 Really appreciate the help, I'm going to see if I can get past this one

👍 1

Jason Motley

11/18/2021, 4:08 PM

@Kevin Kho Any ideas on this? Is this an issue in setting up the database? IT is coming in my "load" task: ``pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting`

Kevin Kho

11/18/2021, 4:09 PM

Could you show me the task definition?

Kevin Kho

11/18/2021, 4:09 PM

Looks like this error comes from pandas though? But I can take a look

Jason Motley

11/18/2021, 4:10 PM

The connection task or the load?

Kevin Kho

11/18/2021, 4:10 PM

Both works

Jason Motley

11/18/2021, 4:10 PM

Copy code

@task(log_stdout=True)
def load(connection: any, if_exists: str, db_table: str, dataframe: pd.DataFrame) -> pd.DataFrame:
    res = dataframe.to_sql(name=db_table, con=connection, if_exists=if_exists, chunksize=None, index=False)
    print(res)

Jason Motley

11/18/2021, 4:11 PM

For the connection I'm hiding some sensitive data

Jason Motley

11/18/2021, 4:11 PM

Copy code

@task(log_stdout=True, checkpoint=False)
def db_connection(credentials: dict) -> any:
    ssl_args = {these are fine}
    user = credentials['username']
    password = credentials['password']
    url = 'confidential-url'
    port = 3306
    connection = pmy.connect(user=user,password=password,host=url,port=port,connect_timeout=10,ssl=ssl_args,
    charset='utf8')
    return connection

Kevin Kho

11/18/2021, 4:11 PM

You can post it. Just replace the stuff with like “XXXXXX”

Kevin Kho

11/18/2021, 4:11 PM

Ah ok

Jason Motley

11/18/2021, 4:12 PM

A few resources said to use sqlalchemy but didn't really explain how to incorporate it

Kevin Kho

11/18/2021, 4:13 PM

This will show you.

Kevin Kho

11/18/2021, 4:13 PM

I don’t think there is anything wrong on the Prefect side of things. It just seems that the pymysql gives headaches

Jason Motley

11/18/2021, 4:13 PM

Replace the

pmy.connect

portion with

create.engine

Kevin Kho

11/18/2021, 4:15 PM

Yes and then pass the engine to your

to_sql

call

Jason Motley

11/18/2021, 4:16 PM

Cool, thank you!

Kevin Kho

11/18/2021, 4:16 PM

Actually….just wanna make sure you know we have a MySQL task?

Jason Motley

11/18/2021, 4:17 PM

I read through that documentation but didn't see an actual example haha

Kevin Kho

11/18/2021, 4:17 PM

This says your current approach is deprecated

Jason Motley

11/18/2021, 4:18 PM

Good to know for our internal documentation 😉

Kevin Kho

11/18/2021, 4:19 PM

I don’t have a MySQL db to test but it would be like this:

Copy code

mysql = MySQLFetch()
with Flow("...") as flow:
    a = mysql(db_name, ...)

to get stuff from the database

Jason Motley

11/18/2021, 4:24 PM

I'd be replacing my "db_connection" with MYSQLFetch, I assume?

Kevin Kho

11/18/2021, 4:33 PM

Fetch opens a connection, runs the query, and closes all in one

Jason Motley

11/18/2021, 4:34 PM

Is there an example lying around that I can model it off?

Kevin Kho

11/18/2021, 4:34 PM

I haven’t seen, but the code snippet above there is how to use it in your flow

Jason Motley

11/18/2021, 4:35 PM

Cool, I'll see what I can do!

Jason Motley

11/18/2021, 4:35 PM

Appreciate the help

👍 1

Jason Motley

11/18/2021, 4:52 PM

One more dumb question - in your example above does the MySQLFetch function as its own task?

Jason Motley

11/18/2021, 4:53 PM

Copy code

# In[28]:
@task(log_stdout=True, checkpoint=False)
def db_connection(credentials: dict) -> any:
    db = MySQLFetch(bla bla)
    return MySQLFetch

Kevin Kho

11/18/2021, 4:54 PM

it is it’s own task so no need to wrap it like that unless you want to modify it in some way

Kevin Kho

11/18/2021, 4:54 PM

in that case, you can do

Copy code

@task(log_stdout=True, checkpoint=False)
def db_connection(credentials: dict) -> any:
    df = MySQLFetch(bla bla).run()
    return df

Note this doesnt return a connection. It returns data already

Jason Motley

11/18/2021, 4:56 PM

ahhhh. It functions as an "extract" task, functionally.

Jason Motley

11/18/2021, 5:02 PM

Do I need to be importing it earlier? I'm getting an error on using MySQLFetch:

Kevin Kho

11/18/2021, 5:07 PM

from prefect.tasks.mysql import MySQLFetch

I think

Kevin Kho

11/18/2021, 5:07 PM

Docs for that is here

Jason Motley

11/18/2021, 5:08 PM

I'll never get over the python syntax after learning R.... thank you!!

Kevin Kho

11/18/2021, 5:12 PM

That is true. You just get everything with

library(…)

Jason Motley

11/18/2021, 5:27 PM

TypeError: can't concat list to bytes

Kevin Kho

11/18/2021, 5:54 PM

Could you show me what you are doing?

Jason Motley

11/18/2021, 5:55 PM

Sure! So, I'm trying to combine the MYSQLFetch function with a standard ETL task/flow setup so that when I end up making transformations, I can include those in the "transform"

Jason Motley

11/18/2021, 5:56 PM

Copy code

@task(log_stdout=True)
def db_connection(credentials: dict) -> any:
    import sqlalchemy as db
    ssl_args = {"XXX"}} ## for local agent testing
    user = credentials['username']
    password = credentials['password']
    url = 'XXX'
    port = XXX
    connection = db.create_engine(
        f'mysql://{user}:{password}@{url}:{port}/?charset=utf8', 
        connect_args=ssl_args
    )
    return connection 

@task(log_stdout=True, checkpoint=False)
def extract(credentials: dict) -> any:
    ssl_args = {"XXXX"}} ## for local agent testing
    query = "QUERY HERE"
    df = MySQLFetch(user=['username'], 
    password=['password'],
    host='XXX',
    port=XXXX,
    ssl=ssl_args,
    charset='utf8',
    query=query).run()
    return df

# This is where we will do any necessary date transformations
@task(log_stdout=True)
def transform(df: pd.DataFrame) -> pd.DataFrame:
    # print(dataframe)
    return df

@task(log_stdout=True)
def load(connection: any, if_exists: str, db_table: str, dataframe: pd.DataFrame) -> pd.DataFrame:
    res = dataframe.to_sql(name=db_table, con=connection, if_exists=if_exists, chunksize=None, index=False)
    print(res)

# In[29]:


# Flow
flow = Flow("Flow Name Here")
with flow:
    df = extract(credentials = PrefectSecret("SECRET"))
    sink = transform(df=df)
    load(connection=connection, if_exists='append', db_table='XXXXX', dataframe=sink)

Kevin Kho

11/18/2021, 6:31 PM

Was in a call. I am not sure where this error is coming from. Do you have a clue? Is it in the checkpointing of load?

Jason Motley

11/18/2021, 6:48 PM

Its in

db_connection

Kevin Kho

11/18/2021, 6:54 PM

Maybe turn checkpointing off for that one

Jason Motley

11/18/2021, 7:02 PM

Led me back to the "can't concat list to bytes" error in the extract task unfortunately.

Kevin Kho

11/18/2021, 7:06 PM

Oh oof. Why is your username and password in a Python lisT?

Kevin Kho

11/18/2021, 7:06 PM

Should it be

credentials['username']

Jason Motley

11/18/2021, 7:18 PM

That took care of the error

Jason Motley

11/18/2021, 7:18 PM

Just down to 1 remaining problem which I believe is in the "connection" task

Jason Motley

11/18/2021, 7:18 PM

TypeError: connection() missing 1 required positional argument: 'credentials'

Jason Motley

11/18/2021, 7:19 PM

I think I should rework my "load' task to just use the results of the extract task?

Kevin Kho

11/18/2021, 7:21 PM

I can’t see what connection is inside the flow block?

Jason Motley

11/18/2021, 7:22 PM

Copy code

# Flow
flow = Flow("XXX")
with flow:
    df = extract(credentials = PrefectSecret("XXXX"))
    sink = transform(df=df)
    load(connection=connection, credentials = PrefectSecret("XXX"), if_exists='append', db_table='XXXt', dataframe=sink)

Jason Motley

11/18/2021, 7:22 PM

And this is the connection

Jason Motley

11/18/2021, 7:23 PM

Copy code

@task(log_stdout=True, checkpoint=False)
def connection(credentials: dict) -> any:
    import sqlalchemy as db
"XXX"}} ## for local agent testing
    user = credentials['username']
    password = credentials['password']
    url = 'XXX'
    port = XXXX
    connection = db.create_engine(
        f'mysql://{user}:{password}@{url}:{port}/?charset=utf8', 
        connect_args=ssl_args
    )
    return connection

Kevin Kho

11/18/2021, 7:25 PM

something is wrong here because

load(connection=connection)

is using the task. I think you need to create the connection then pass it. You are passing a task

Kevin Kho

11/18/2021, 7:26 PM

Also it would be best practice to rename the function I think

Jason Motley

11/18/2021, 7:28 PM

AttributeError: 'tuple' object has no attribute 'to_sql'

Jason Motley

11/18/2021, 7:28 PM

Seems like I need to convert the requested data prior to writing it back?

Kevin Kho

11/18/2021, 7:31 PM

Let me see the return type of MySQLFetch. we probably just need to convert to pandas

Jason Motley

11/18/2021, 7:31 PM

Sound good, thanks

Kevin Kho

11/18/2021, 7:33 PM

Are you trying to write to a SQL database with

to_sql

Jason Motley

11/18/2021, 7:34 PM

yes

Kevin Kho

11/18/2021, 7:36 PM

I can’t immediately tell what the return type is. Can you try logging the output? I suspect it’s

List[Tuple]

but not 100% sure

Kevin Kho

11/18/2021, 7:38 PM

Do this for a quick test:

Copy code

results = MySQLFetch(...).run()
print(results)
print(type(results))

Kevin Kho

11/18/2021, 7:38 PM

No need to be in a Flow for this or task. Just normal Python

Jason Motley

11/18/2021, 7:51 PM

I'm getting some errors but I believe its a List[Tuple]

Kevin Kho

11/18/2021, 7:52 PM

You need to convert it to the DataFrame like this to get the

to_sql

method

Jason Motley

11/18/2021, 7:59 PM

I used the from_items method and got this:

("type object 'DataFrame' has no attribute 'from_items'")

Kevin Kho

11/18/2021, 8:01 PM

That does look deprecated

Kevin Kho

11/18/2021, 8:01 PM

Method 2 will probably fit though i think

Jason Motley

11/18/2021, 8:40 PM

TypeError("'datetime.date' object is not iterable")

Kevin Kho

11/18/2021, 8:51 PM

Could you show me the print output?

Jason Motley

11/18/2021, 8:53 PM

of the table or the full error log?

Kevin Kho

11/18/2021, 8:59 PM

The table output so we can figure out how to put it in pandas

Kevin Kho

11/18/2021, 8:59 PM

or just part of it to give me a clue

Jason Motley

11/18/2021, 9:25 PM

sorry was debugging haha

Jason Motley

11/18/2021, 9:39 PM

Kevin Kho

11/18/2021, 9:52 PM

I would prefer the print in the terminal so I have a feel for the data type

Jason Motley

11/18/2021, 10:17 PM

I actually managed to get past that part!

Kevin Kho

11/18/2021, 10:19 PM

thumbs up

Jason Motley

11/18/2021, 10:21 PM

One last error here, on my load:

Jason Motley

11/18/2021, 10:21 PM

Copy code

@task(log_stdout=True)
def load(connection: any, if_exists: str, db_table: str, dataframe: pd.DataFrame, credentials: dict) -> pd.DataFrame:
    res = df.to_sql(name=db_table, con=connection, if_exists=if_exists, chunksize=None, index=False)
    ## res = dataframe.to_sql(name=db_table, con=connection, credentials = PrefectSecret("XXX"), if_exists=if_exists, chunksize=None, index=False)
    print(res)

Jason Motley

11/18/2021, 10:21 PM

The error:

'FunctionTask' object has no attribute 'to_sql'")

Jason Motley

11/18/2021, 10:22 PM

"df" is the DataFrame here.

Kevin Kho

11/18/2021, 10:23 PM

FunctionTask

is something you wrapped in a

@task

decorator. This is happening because df is not something being passed into this function. I think you want to use

<http://dataframe.to|dataframe.to>_sql

or change your input to

df

Jason Motley

11/18/2021, 10:28 PM

odd, same error when making both changes

Kevin Kho

11/18/2021, 10:29 PM

no no you need one or the other but not both. the variable name needs to match the input argument

Jason Motley

11/18/2021, 10:29 PM

oh sorry that's what I meant haha

12 Views

Open in Slack

Previous Next