Thread: how do I update or insert efficently in postgres
I need to do the follwoing logic for a db I am building: if row exists update some fields else insert all fields I have come across this befor and have used select to drive the choice, if I could get the row update else insert. The db I worked on had a few thousand rows so it was fast enough. This table will have around 1 million rows to start out with and I was wondering if there was any way to do this better. I am touching each row twice and would like to get that down to once if possable. If that is not possable would it be better to move the whole thing inside of 1 explicit transaction? Any other ideas I have missed? Thank you marc
Maqrc, > if row exists update some fields > else insert all fields > > I have come across this befor and have used select to drive the > choice, if I could get the row update else insert. The db I worked > on > had a few thousand rows so it was fast enough. This table will have > around 1 million rows to start out with and I was wondering if there > was any way to do this better. I am touching each row twice and > would > like to get that down to once if possable. If that is not possable > would it be better to move the whole thing inside of 1 explicit > transaction? Any other ideas I have missed? Not really. If you have to check for existance, that's going to be a seperate query from the UPDATE/INSERT no matter how you cut it. MySQL's REPLACE probably just hides this double-check from you. Using an integer surrogate key which is indexed and VACUUMed regularly is about all the performance increase you can get. Except maybe moving the table and the WAL_FILES both to seperate disks. If you have a significant budget for your project, there could be ways around this ... you'd just have to hire yourself a PostgreSQL hacker. They could theoretically write modifications to PostgreSQL that would do the following: Scan the index for a pointer to the existing record. If a pointer is found, use it to do the update without a second index scan. If the pointer is not found, do an insert without a unique index check. However, the above could be costly to implement and put you out of line for further PostgreSQL upgrades. So throwing hardware at the problem is probably a better idea. -Josh ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
marc@oscar.eng.cv.net (Marc Spitzer) writes: > I need to do the follwoing logic for a db I am building: > if row exists update some fields > else insert all fields > I have come across this befor and have used select to drive the > choice, if I could get the row update else insert. Which case do you think will be more common? If UPDATE is the more common scenario then it's a win to do UPDATE set modifiable-fields = whatever WHERE key = whateverif (zero rows updated) INSERT ... Alternatively you can do INSERT ...if (fail due to duplicate key) UPDATE ... if you think INSERT is the more common case. (This all assumes you have a unique key for the table, but if you don't, then what do you mean by "the row already exists"?) Neither of these are perfect, however. The former has a race condition if two clients might try to insert the same key at about the same time. You can improve it to BEGIN;UPDATE set modifiable-fields = whatever WHERE key = whateverif (zero rows updated){ INSERT ... if (fail due toduplicate key) { ABORT; loop back to BEGIN; }}COMMIT; but this is kinda ugly. (Of course, if you could have two clients independently inserting/updating the same row at the same time, you have problems anyway: which one should win, and why? I think the coding difficulty may tell you you have a design problem.) As for the INSERT-then-UPDATE approach, you have the same problem that you have to ABORT and start a new transaction if the INSERT fails. This is uncool if you really want the whole thing to be part of a larger transaction. But as long as you've guessed right about which case is more common, you have only one query most of the time. regards, tom lane
Re: how do I update or insert efficently in postgres
From
marc@oscar.eng.cv.net (Marc Spitzer)
Date:
In article <3557.1005685484@sss.pgh.pa.us>, Tom Lane wrote: > marc@oscar.eng.cv.net (Marc Spitzer) writes: >> I need to do the follwoing logic for a db I am building: >> if row exists update some fields >> else insert all fields > >> I have come across this befor and have used select to drive the >> choice, if I could get the row update else insert. > > Which case do you think will be more common? If UPDATE is the more > common scenario then it's a win to do > > UPDATE set modifiable-fields = whatever WHERE key = whatever > if (zero rows updated) > INSERT ... > > Alternatively you can do > > INSERT ... > if (fail due to duplicate key) > UPDATE ... > > if you think INSERT is the more common case. (This all assumes you > have a unique key for the table, but if you don't, then what do you > mean by "the row already exists"?) > > Neither of these are perfect, however. The former has a race condition > if two clients might try to insert the same key at about the same time. > You can improve it to > > BEGIN; > UPDATE set modifiable-fields = whatever WHERE key = whatever > if (zero rows updated) > { > INSERT ... > if (fail due to duplicate key) > { > ABORT; > loop back to BEGIN; > } > } > COMMIT; > > but this is kinda ugly. (Of course, if you could have two clients > independently inserting/updating the same row at the same time, you > have problems anyway: which one should win, and why? I think the > coding difficulty may tell you you have a design problem.) > > As for the INSERT-then-UPDATE approach, you have the same problem > that you have to ABORT and start a new transaction if the INSERT > fails. This is uncool if you really want the whole thing to be part > of a larger transaction. > > But as long as you've guessed right about which case is more common, > you have only one query most of the time. > > regards, tom lane > after the first pass it will be mostly updates and I know I have a flawed design, but I have only 1 batch job that loads information so there should be no major problems there, yes I know famious last words. And I need to get something finished quickly to start generating reports off of it. Thanks marc > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
What if you inserted all the records into a temporary table. Delete all matching records (which would have been updated)in the current table (via a join). Then copy all the entries from the temporary table into your existing table. Delete the temp table etc. Llew > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Marc Spitzer > Sent: 13 November 2001 18:19 > To: pgsql-sql@postgresql.org > Subject: [SQL] how do I update or insert efficently in postgres > > > I need to do the follwoing logic for a db I am building: > > if row exists update some fields > else insert all fields > > I have come across this befor and have used select to drive > the choice, if I could get the row update else insert. The > db I worked on had a few thousand rows so it was fast enough. > This table will have around 1 million rows to start out with > and I was wondering if there was any way to do this better. > I am touching each row twice and would like to get that down > to once if possable. If that is not possable would it be > better to move the whole thing inside of 1 explicit > transaction? Any other ideas I have missed? > > Thank you > > marc > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > http://www.postgresql.org/users-lounge/docs/faq.html