https://prefect.io logo
Title
r

Royzac

04/01/2022, 1:56 PM
Does Pandas have a bulk update(not insert) functionality with sqlalchemy?
k

Kevin Kho

04/01/2022, 1:59 PM
I don’t know a lot on this, but update sounds hard (and slow). We’ll see what others have to say. I’ve only dealt with inserts
r

Royzac

04/01/2022, 2:14 PM
Why do you think that? Here's an example for sqlalchemy but it has ridiculous hang(that never seems to end) session = Session() session.bulk_update_mappings(ConversionRate, merged_table_dict) session.commit() session.close()
k

Kevin Kho

04/01/2022, 2:19 PM
I was personally in a similar situation and we tried a bulk upsert (pure SQL) and it was taking forever. I forget exact numbers but it felt like the insert was 30 minutes and the upsert was 20 hours. So similar to your ridiculous hang.
But I dont know much here so dont take my word for it
k

Kyle McChesney

04/01/2022, 3:39 PM
Not sure about pandas, but I’ve generally had good luck using CSV import for bulk performance. You can CSV load into a temp table and then do UPDATE FROM on the temp table to the new table. Something like this: https://www.virtual-dba.com/blog/postgresql-updating-data-using-csv-file-with-dbeaver/ (ignore all the GUI stuff but the queries are the same)
k

Kevin Kho

04/01/2022, 3:40 PM
^ This sounds like the solution we ended up with at my last job where the temp table insertion was a bulk insert and this was way more performant
r

Royzac

04/01/2022, 4:17 PM
@Kyle McChesney I've used that method before, and would be an option, but it's not very seamless. I use pandas rd_sql, then manipulation, create temp table(to_sql) with it, then update to regular table with the data from the temp table.
k

Kyle McChesney

04/01/2022, 4:22 PM
yeah it is quite a painful process to walk through
r

Royzac

04/01/2022, 5:08 PM
Curious, how computationally expensive is it to merge values and pandas.to_sql('replace') vs bulk update vs iterated update? @Kyle McChesney @Kevin Kho