Thread: Batch updates to 1 column using python-pgsql in multiple rows
Hi guys, I have to update a certain column of a table at many rows. (order of thousands). For now, I am using a prepare statement and executing the queries like: query = "UPDATE mytable SET myvar = $1 WHERE myvar2=$2" db.execute(query, (var, var1)) #db is my connection object But, I still feel like I can make it faster by using db.executemany() method. The thing is I am not able to get the syntax of it. Can anyone please guide me as to how do I do batch updates in the fastest possible manner? Also, will executemany() really be faster? If so, how does that result in faster batch updates? -- Regards, Abhijeet http://blog.abhijeetr.com
On Wed, Aug 29, 2012 at 6:58 AM, Abhijeet R <abhijeet.1989@gmail.com> wrote: > I have to update a certain column of a table at many rows. (order of > thousands). > > For now, I am using a prepare statement and executing the queries like: > > query = "UPDATE mytable SET myvar = $1 WHERE myvar2=$2" > db.execute(query, (var, var1)) #db is my connection object Are you able to reformulate the multiple individual UPDATEs into a single bulk UPDATE statement? That will almost certainly be the fastest way. > But, I still feel like I can make it faster by using db.executemany() > method. The thing is I am not able to get the syntax of it. That's more of a question for your specific database adapter (python-pgsql, according to your message subject) rather than this list. But if you are able to construct a list of tuples, e.g. list_to_insert = [(var, var1) for (var, var1) in ... ] Then you should be able to use executemany() like this: db.executemany(query, list_to_insert) BTW, is there a reason you're using python-pgsql instead of, say, psycopg2? It seems like the former may be a dead project. > Can anyone please guide me as to how do I do batch updates in the fastest > possible manner? Also, will executemany() really be faster? If so, how does > that result in faster batch updates? Again, the fastest way to perform these UPDATEs would be to reformulate your individual queries into a single bulk UPDATE. If you really are setting each value of "myvar" to something different, and not a value derived from some other column within its row, and you can't reasonably formulate that logic in a single bulk UPDATE query, then I'd bet the next fastest way would be to use a COPY statement to load a temporary table with your (myvar, myvar2) pairs as rows, then perform a single: UPDATE mytable FROM my_temp_table SET myvar = my_temp_table.myvar, WHERE myvar2 = my_temp_table.myvar2; Josh
Hi Josh, The reason for me using python-pgsql was that it was the first adapter to come in Google search. After you said, I looked at the documentation of psycopg2 and it looks more promising. As myvar value is different for every row, I will use the COPY and then UPDATE statement as said by you. Thanks for your suggestions. On Wed 29 Aug 2012 11:12:11 PM IST, Josh Kupershmidt wrote: > On Wed, Aug 29, 2012 at 6:58 AM, Abhijeet R <abhijeet.1989@gmail.com> wrote: >> I have to update a certain column of a table at many rows. (order of >> thousands). >> >> For now, I am using a prepare statement and executing the queries like: >> >> query = "UPDATE mytable SET myvar = $1 WHERE myvar2=$2" >> db.execute(query, (var, var1)) #db is my connection object > > Are you able to reformulate the multiple individual UPDATEs into a > single bulk UPDATE statement? That will almost certainly be the > fastest way. > >> But, I still feel like I can make it faster by using db.executemany() >> method. The thing is I am not able to get the syntax of it. > > That's more of a question for your specific database adapter > (python-pgsql, according to your message subject) rather than this > list. But if you are able to construct a list of tuples, e.g. > list_to_insert = [(var, var1) for (var, var1) in ... ] > > Then you should be able to use executemany() like this: > > db.executemany(query, list_to_insert) > > BTW, is there a reason you're using python-pgsql instead of, say, > psycopg2? It seems like the former may be a dead project. > >> Can anyone please guide me as to how do I do batch updates in the fastest >> possible manner? Also, will executemany() really be faster? If so, how does >> that result in faster batch updates? > > Again, the fastest way to perform these UPDATEs would be to > reformulate your individual queries into a single bulk UPDATE. If you > really are setting each value of "myvar" to something different, and > not a value derived from some other column within its row, and you > can't reasonably formulate that logic in a single bulk UPDATE query, > then I'd bet the next fastest way would be to use a COPY statement to > load a temporary table with your (myvar, myvar2) pairs as rows, then > perform a single: > > UPDATE mytable > FROM my_temp_table > SET myvar = my_temp_table.myvar, > WHERE myvar2 = my_temp_table.myvar2; > > Josh -- Regards, Abhijeet http://blog.abhijeetr.com