Does Pandas have a bulk update(not insert) functio...
# random
r
Does Pandas have a bulk update(not insert) functionality with sqlalchemy?
k
I don’t know a lot on this, but update sounds hard (and slow). We’ll see what others have to say. I’ve only dealt with inserts
r
Why do you think that? Here's an example for sqlalchemy but it has ridiculous hang(that never seems to end) session = Session() session.bulk_update_mappings(ConversionRate, merged_table_dict) session.commit() session.close()
k
I was personally in a similar situation and we tried a bulk upsert (pure SQL) and it was taking forever. I forget exact numbers but it felt like the insert was 30 minutes and the upsert was 20 hours. So similar to your ridiculous hang.
But I dont know much here so dont take my word for it
k
Not sure about pandas, but I’ve generally had good luck using CSV import for bulk performance. You can CSV load into a temp table and then do UPDATE FROM on the temp table to the new table. Something like this: https://www.virtual-dba.com/blog/postgresql-updating-data-using-csv-file-with-dbeaver/ (ignore all the GUI stuff but the queries are the same)
k
^ This sounds like the solution we ended up with at my last job where the temp table insertion was a bulk insert and this was way more performant
r
@Kyle McChesney I've used that method before, and would be an option, but it's not very seamless. I use pandas rd_sql, then manipulation, create temp table(to_sql) with it, then update to regular table with the data from the temp table.
k
yeah it is quite a painful process to walk through
r
Curious, how computationally expensive is it to merge values and pandas.to_sql('replace') vs bulk update vs iterated update? @Kyle McChesney @Kevin Kho