https://prefect.io logo
c

Choenden Kyirong

04/09/2023, 6:01 PM
Hey everyone, I’m attempting to insert hundreds of thousands of rows into a postgres db using
prefect_sqlalchemy.SqlAlchemyConnector
with
execute_many(insert_query, seq_of_parameters)
where
seq_of_parameters
has all the rows. This seems to take an extremely long time- a bit of a newbie here and i’m wondering if there are any approaches or advice for doing something like this? Thanks!
r

redsquare

04/09/2023, 8:04 PM
I would look at Copy for bulk inserts https://www.psycopg.org/psycopg3/docs/basic/copy.html
c

Choenden Kyirong

04/09/2023, 8:33 PM
@redsquare using
prefect_sqlalchemy.SqlAlchemyConnector
, do we have access to those functions?
Copy code
db_block=SqlAlchemyConnector.load("<name>")
conn = db_block.get_client(client_type="connection")
cursor = conn.cursor()
I’m trying to do this in order to access those functions from
conn.cursor()
However, i’m getting the following error:
Copy code
AttributeError: '_trans_ctx' object has no attribute 'cursor'
Or should I not use Prefect’s sqlalchemy for this?
r

redsquare

04/09/2023, 8:38 PM
Its not part of prefect alchemy yet - not sure if they will bring something like that in
for 100's k of rows I would not use inserts
c

Choenden Kyirong

04/09/2023, 8:44 PM
hmmm okok. Thanks for the material. I also came across this really great post: https://hakibenita.com/fast-load-data-python-postgresql#create-a-table-in-the-database
r

redsquare

04/09/2023, 8:46 PM
copy ftw
c

Choenden Kyirong

04/09/2023, 8:47 PM
yeah, tried to find a way to use a cursor from prefect sqlalchemy, doesn’t seem available at the moment. I guess just using the library itself is the way to go for this?
@redsquare actually, there’s way. I was just digging through some documentation and found a way to get access to this- the function call to
copy_from()
works!
Copy code
database_block = SqlAlchemyConnector.load("<name>")    
  with database_block.get_client(client_type="connection") as conn:
        engine = database_block.get_client(client_type="engine")
        connection = engine.raw_connection()
        cursor = connection.cursor()
        logger = get_run_logger()
        cursor.copy_from(<filename>)
there may be a slicker way to do it- will look into that. but for now, testing this works.