Thread: most idiomatic way to "update or insert"?
So I have some data that I want to put into a table. If the row already exists (as defined by the primary key), I would like to update the row. Otherwise, I would like to insert the row. I've been doing something like delete from foo where name = 'xx'; insert into foo values('xx',1,2,...); but I've been wondering if there's a more idiomatic or canonical way to do this. Many TIA, Mark -- Mark Harrison Pixar Animation Studios
Mark, It's not canonical by any means, but what I do is: update foo set thing='stuff' where name = 'xx' and thing<>'stuff'; insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where not exists (select 1 from foo where name='xx')); I believe if you put these on the same line it will be a single transaction. It has the benefit of not updating the row if there aren't real changes. It's plenty quick too, if name is indexed. Thanks, Peter Darley -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Mark Harrison Sent: Wednesday, August 04, 2004 4:26 PM To: pgsql-general@postgresql.org Subject: [GENERAL] most idiomatic way to "update or insert"? So I have some data that I want to put into a table. If the row already exists (as defined by the primary key), I would like to update the row. Otherwise, I would like to insert the row. I've been doing something like delete from foo where name = 'xx'; insert into foo values('xx',1,2,...); but I've been wondering if there's a more idiomatic or canonical way to do this. Many TIA, Mark -- Mark Harrison Pixar Animation Studios ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
You could always do a SELECT and if the row exists then UPDATE otherwise INSERT. Or INSERT and if you get an error then UPDATE.
Duane
-----Original Message-----
From: Mark Harrison [mailto:mh@pixar.com]
Sent: Wednesday, August 04, 2004 4:26 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] most idiomatic way to "update or insert"?
So I have some data that I want to put into a table. If the
row already exists (as defined by the primary key), I would
like to update the row. Otherwise, I would like to insert
the row.
I've been doing something like
delete from foo where name = 'xx';
insert into foo values('xx',1,2,...);
but I've been wondering if there's a more idiomatic or canonical
way to do this.
Many TIA,
Mark
--
Mark Harrison
Pixar Animation Studios
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
I'll mention that often I do exactly what you're doing. I find deleting all existing records and then inserting what I want to appear to be cleaner than handling the various cases that can arise if you don't. This happens most often when I have a list of items and have a UI that allows the user to edit the entire list and commit a whole new list in one action. It's much easier to simply delete the old list and insert the entire new list in a single query than to try to figure out which rows to delete and which to insert. -- greg
I don't think that works - there's a race condition if you do not do any locking. Why: Before a transaction that inserts rows is committed, other transactions are not aware of the inserted rows, so the select returns no rows. So: You can either create a unique index and catch insert duplicate failures. Or: lock the relevant tables, then do the select ... update/insert or insert ... select , or whatever it is you want to do. Or: both. Test it out yourself. At 07:51 AM 8/5/2004, Peter Darley wrote: >Mark, > It's not canonical by any means, but what I do is: > >update foo set thing='stuff' where name = 'xx' and thing<>'stuff'; >insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where >not exists (select 1 from foo where name='xx')); > > I believe if you put these on the same line it will be a single >transaction. It has the benefit of not updating the row if there aren't >real changes. It's plenty quick too, if name is indexed. > >Thanks, >Peter Darley
I use stored procedures : create function insertorupdate(....) UPDATE mytable WHERE ... SET ... IF NOT FOUND THEN INSERT INTO mytable ... END IF; You lose flecibility in your request though. I wish Postgresql had an INSERT OR UPDATE like MySQL does. So far it's the only thing that I regret from MySQL.
Mark Harrison wrote: > I've been doing something like > > delete from foo where name = 'xx'; > insert into foo values('xx',1,2,...); > > but I've been wondering if there's a more idiomatic or canonical > way to do this. The delete+insert isn't quite the same as an update since you might have foreign keys referring to foo with "ON DELETE CASCADE" - oops, just lost all your dependant rows. Other people have warned about race conditions with insert/test/update. An "update or insert" would be useful sometimes, but it's not always necessary. Indeed, if I find I don't know whether I'm adding or updating something I take a long hard look at my design - it ususally means I've not thought clearly about something. For a "running total" table it can make more sense to have an entry with a total of 0 created automatically via a trigger. Likewise with some other summary tables. Can you give an actual example of where you need this? -- Richard Huxton Archonet Ltd
Hi, I prefer to update and if the number of updated rows equals 0 do an insert. So in case of update I need only one roundtrip. If insert is far more common in this case it might be better try insert and catch the error. But I try to avoid running on an error intentionally. First delete and then insert works but needs 2 SQL-statements in every case. And the database need to update indexes at least once. There might be also problems with cascaded deletes. Tommi Am Donnerstag, 5. August 2004 01:25 schrieb Mark Harrison: > So I have some data that I want to put into a table. If the > row already exists (as defined by the primary key), I would > like to update the row. Otherwise, I would like to insert > the row. > > I've been doing something like > > delete from foo where name = 'xx'; > insert into foo values('xx',1,2,...); > > but I've been wondering if there's a more idiomatic or canonical > way to do this. > > Many TIA, > Mark
Lincoln, It works for me... I think what you said is wrong because it updates first (if there is a row to update), then inserts. If there is a row to update the insert won't insert anything. If there is no row to update the insert inserts a row. Either way, the insert is the last thing in the transaction. Plus, as shown in the code to follow, I have almost this exact thing in my application and I know that it does work for me. :) Code (Perl): $Neo::DB::Neo->do ("UPDATE Sample_Settings SET Value=" . Quote($Args{Value}) . " WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting=" . Quote($Args{Setting}) . "; INSERT INTO Sample_Settings (Sample_ID, Setting, Value) (SELECT " . Quote($Args{SampleID}) . ", " . Quote($Args{Setting}) . ", " . Quote($Args{Value}) . " WHERE NOT EXISTS (SELECT 1 FROM Sample_Settings WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting=" . Quote($Args{Setting}) . "));"); Thanks, Peter Darley -----Original Message----- From: Lincoln Yeoh [mailto:lyeoh@pop.jaring.my] Sent: Wednesday, August 04, 2004 6:49 PM To: Peter Darley; Mark Harrison; pgsql-general@postgresql.org Subject: Re: [GENERAL] most idiomatic way to "update or insert"? I don't think that works - there's a race condition if you do not do any locking. Why: Before a transaction that inserts rows is committed, other transactions are not aware of the inserted rows, so the select returns no rows. So: You can either create a unique index and catch insert duplicate failures. Or: lock the relevant tables, then do the select ... update/insert or insert ... select , or whatever it is you want to do. Or: both. Test it out yourself. At 07:51 AM 8/5/2004, Peter Darley wrote: >Mark, > It's not canonical by any means, but what I do is: > >update foo set thing='stuff' where name = 'xx' and thing<>'stuff'; >insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where >not exists (select 1 from foo where name='xx')); > > I believe if you put these on the same line it will be a single >transaction. It has the benefit of not updating the row if there aren't >real changes. It's plenty quick too, if name is indexed. > >Thanks, >Peter Darley
Peter Darley wrote: > Lincoln, It works for me... I think what you said is wrong > because it updates first (if there is a row to update), then > inserts. If there is a row to update the insert won't insert > anything. If there is no row to update the insert inserts a row. > Either way, the insert is the last thing in the transaction. > Plus, as shown in the code to follow, I have almost this exact > thing in my application and I know that it does work for me. :) You're getting lucky. I suggested the same thing four years ago. The race condition is still there: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=2344.978158285%40sss.pgh.pa.us HTH, Mike Mascari
Peter, The "does not work" part is not refering to the method not working at all, but to the fact that it is not safe when you have multiple transactions operating on the same row at the same time. There are plenty of discussions about the insert-or-update race conditions on this list, and the final conclusion was always that it is not possible to solve the race condition without being prepared to catch exceptions and retry the whole thing until it succedes... The reason of the race condition: let's say 2 transactions A and B try to insert-or-update the same row which does not exist. They do the update statement at the same time, and BOTH OF THEM gets as a result that no rows were updated, since the row does not exist yet. Now both transactions try to insert the row, and obviously one of them will fail. So your code must be prepared that the insert can fail, and in that case it should retry with the update. People tried to devise a method to avoid the race condition and throwing exception, but it is just not possible. Now the one bad thing in postgres which people complained about in this context is that the transaction gets rolled back on any error, so actually instead of just retrying the update, you will have to redo your whole transaction. HTH, Csaba. On Thu, 2004-08-05 at 15:28, Peter Darley wrote: > Lincoln, > It works for me... > I think what you said is wrong because it updates first (if there is a row > to update), then inserts. If there is a row to update the insert won't > insert anything. If there is no row to update the insert inserts a row. > Either way, the insert is the last thing in the transaction. Plus, as shown > in the code to follow, I have almost this exact thing in my application and > I know that it does work for me. :) > > Code (Perl): > $Neo::DB::Neo->do ("UPDATE Sample_Settings SET Value=" . Quote($Args{Value}) > . " WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting=" . > Quote($Args{Setting}) . "; INSERT INTO Sample_Settings (Sample_ID, Setting, > Value) (SELECT " . Quote($Args{SampleID}) . ", " . Quote($Args{Setting}) . > ", " . Quote($Args{Value}) . " WHERE NOT EXISTS (SELECT 1 FROM > Sample_Settings WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting=" > . Quote($Args{Setting}) . "));"); > > Thanks, > Peter Darley > > -----Original Message----- > From: Lincoln Yeoh [mailto:lyeoh@pop.jaring.my] > Sent: Wednesday, August 04, 2004 6:49 PM > To: Peter Darley; Mark Harrison; pgsql-general@postgresql.org > Subject: Re: [GENERAL] most idiomatic way to "update or insert"? > > > I don't think that works - there's a race condition if you do not do any > locking. > > Why: > Before a transaction that inserts rows is committed, other transactions are > not aware of the inserted rows, so the select returns no rows. > > So: > You can either create a unique index and catch insert duplicate failures. > > Or: > lock the relevant tables, then do the select ... update/insert or insert > ... select , or whatever it is you want to do. > > Or: > both. > > Test it out yourself. > > At 07:51 AM 8/5/2004, Peter Darley wrote: > >Mark, > > It's not canonical by any means, but what I do is: > > > >update foo set thing='stuff' where name = 'xx' and thing<>'stuff'; > >insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where > >not exists (select 1 from foo where name='xx')); > > > > I believe if you put these on the same line it will be a single > >transaction. It has the benefit of not updating the row if there aren't > >real changes. It's plenty quick too, if name is indexed. > > > >Thanks, > >Peter Darley > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Mike, Ahha! I didn't understand what the objection was. I guess I am getting lucky. :) It seems to me that this is true with any concurrent inserts, isn't it? One will succeed and one will fail. Thanks, Peter Darley -----Original Message----- From: Mike Mascari [mailto:mascarm@mascari.com] Sent: Thursday, August 05, 2004 6:51 AM To: Peter Darley Cc: Lincoln Yeoh; Mark Harrison; pgsql-general@postgresql.org Subject: Re: [GENERAL] most idiomatic way to "update or insert"? Peter Darley wrote: > Lincoln, It works for me... I think what you said is wrong > because it updates first (if there is a row to update), then > inserts. If there is a row to update the insert won't insert > anything. If there is no row to update the insert inserts a row. > Either way, the insert is the last thing in the transaction. > Plus, as shown in the code to follow, I have almost this exact > thing in my application and I know that it does work for me. :) You're getting lucky. I suggested the same thing four years ago. The race condition is still there: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=2344.978158285%40sss .pgh.pa.us HTH, Mike Mascari
> An "update or insert" would be useful sometimes, but it's not always > necessary. Indeed, if I find I don't know whether I'm adding or updating > something I take a long hard look at my design - it ususally means I've > not thought clearly about something. ... > Can you give an actual example of where you need this? We have an environment where our data collection occurs by screen scraping (er, web scraping?). Unfortunately, it takes two passes, once across search results which provide partial data, then a second time over a detail page loaded for each item in the search results we were given. Since time is of the essence, we provide the partial data to our customers, which means dealing with the insert or update. Additionally, the process is multithreaded, so search results can be touching things concurrently with details being loaded, otherwise we can't keep up. I dealt with the problem by wrapping every touch of an item in a single transaction with a loop around it, as has been recommended here many times before. Any DB-exception (Python) inside the loop caused by concurrency type problems causes a restart. As it turns out, the insert/update race has yet to result in a retry. The real payoff in this design has proven to be dealing with FK locking... without putting way more effort into fixing it than the deadlocks are worth, we get around a dozen deadlocks a day that are automatically retried.
Richard Huxton wrote: > > An "update or insert" would be useful sometimes, but it's not always > necessary. Indeed, if I find I don't know whether I'm adding or > updating something I take a long hard look at my design - it ususally > means I've not thought clearly about something. > Can you give an actual example of where you need this? How about stocks for a simple example? Let's say you have a simple table with the stock symbol, stock exchange, high, low, open, close and volume. Every day you update the data for each stock. But there are always new stocks listed on an exchange, so when a new stock shows up you have to do an insert instead of an update. Ron
Ron St-Pierre wrote: > Richard Huxton wrote: > >> >> An "update or insert" would be useful sometimes, but it's not always >> necessary. Indeed, if I find I don't know whether I'm adding or >> updating something I take a long hard look at my design - it ususally >> means I've not thought clearly about something. > >> Can you give an actual example of where you need this? > > How about stocks for a simple example? Let's say you have a simple table > with the stock symbol, stock exchange, high, low, open, close and > volume. Every day you update the data for each stock. But there are > always new stocks listed on an exchange, so when a new stock shows up > you have to do an insert instead of an update. If it is just a simple table then delete all of them and insert from scratch. If you wanted to track changes over time (perhaps more likely), you'd have a separate table with the company name/address etc and a log table. At which point you'll want to know if it's a new company or not... -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Ron St-Pierre wrote: > >> Richard Huxton wrote: >> >>> >>> An "update or insert" would be useful sometimes, but it's not always >>> necessary. Indeed, if I find I don't know whether I'm adding or >>> updating something I take a long hard look at my design - it >>> ususally means I've not thought clearly about something. >> >> >>> Can you give an actual example of where you need this? >> >> >> How about stocks for a simple example? Let's say you have a simple >> table with the stock symbol, stock exchange, high, low, open, close >> and volume. Every day you update the data for each stock. But there >> are always new stocks listed on an exchange, so when a new stock >> shows up you have to do an insert instead of an update. > > > If it is just a simple table then delete all of them and insert from > scratch. If you wanted to track changes over time (perhaps more > likely), you'd have a separate table with the company name/address etc > and a log table. At which point you'll want to know if it's a new > company or not... > Okay, this simple example really exists, but the simple table also includes a date that the stock was last traded, so we have: stock symbol, stock exchange, high, low, open, close, volume, date, plus a few more fields But the data isn't always updated at one time, as we can update all stocks for one exhange and possibly only some of the stocks for a particular exchange in one go. Even if the data is received for only one exchange we could delete all stocks for that exchange and insert new ones, which would work fine. However some stocks are not traded every day, so we need to show the trading information for the last date that it was traded, so we can't delete them en masse even for the one exchange. BTW these updates do take longer than we'd like so I would appreciate more input on how this setup could be redesigned. Ron
Ron St-Pierre <rstpierre@syscor.com> writes: > BTW these updates do take longer than we'd like so I would appreciate more > input on how this setup could be redesigned. Where is the input coming from? One option is to batch changes. If you just insert into a log table whenever new data is available, and then do a batch update of many records you would have a few advantages. 1) You could have a single updater and therefore no worries with concurrency. 2) The optimizer could choose a merge join or at least a nested loop and avoid multiple round trips. Something like update current_stock_price set price = log.price, timestamp = log.timestamp from stock_price log where current_stock_price.stock = stock_price_log.stock and stock_price_log.timestamp between ? and ? You can either just use deterministic time ranges like midnight-midnight or keep careful track of the last time the job was run. You would first have to insert into current_stock_price any missing stocks, but if you're batching them then again you don't have to worry about someone else inserting them in the middle of your query. And it's more efficient to add lots of them in one shot than one at a time. -- greg
Ron St-Pierre wrote: > Okay, this simple example really exists, but the simple table also > includes a date that the stock was last traded, so we have: > stock symbol, stock exchange, high, low, open, close, volume, date, plus > a few more fields [snip more details] > BTW these updates do take longer than we'd like so I would appreciate > more input on how this setup could be redesigned. Well, I'd probably make the primary key (stock_id, trading_date) and just insert into a log table. From there I'd update into a summary table, or use a view. Of course, that might make things slower in your case. -- Richard Huxton Archonet Ltd
Hi, > I prefer to update and if the number of updated rows equals 0 do an > insert. So in case of update I need only one roundtrip. If insert is far > more common in this case it might be better try insert and catch the > error. But I try to avoid running on an error intentionally. When logging to a compact table that stores data in an aggregate form, I used something like that: BEGIN; UPDATE ... ; if (!affected_rows) { INSERT ... ; if (error) { ROLLBACK; UPDATE ... ; } } COMMIT; I added the error check with a second UPDATE try after INSERT to increase accuracy. In fact, INSERTs were sometimes failing because of concurrency, and this was the only viable solution I found to avoid losing data. Best regards -- Matteo Beccati http://phpadsnew.com/ http://phppgads.com/
Greg Stark wrote: >Ron St-Pierre <rstpierre@syscor.com> writes: > > > >>BTW these updates do take longer than we'd like so I would appreciate more >>input on how this setup could be redesigned. >> >> > >Where is the input coming from? > >One option is to batch changes. > <snip> > >Something like > >update current_stock_price > set price = log.price, > timestamp = log.timestamp > from stock_price log > where current_stock_price.stock = stock_price_log.stock > and stock_price_log.timestamp between ? and ? > > We check for new stocks and add them, and initially were using a procedure to do something similar to your code: CREATE OR REPLACE FUNCTION updateData() RETURNS SETOF datatype AS ' DECLARE rec RECORD; BEGIN FOR rec IN SELECT symbol, tradeDate, tickDate, high, low , open, close, volume FROM exchangedata LOOP RETURN NEXT rec; UPDATE stockdata SET high=rec.high, low=rec.low, open=rec.low, close=rec.close, volume=rec.volume, tradeDate=rec.tradeDate WHERE symbol=rec.symbol; END LOOP; RETURN; END; ' LANGUAGE 'plpgsql'; ... but it took too long. Off hand, do you know if your approach above would be quicker? Ron
Matteo Beccati <php@beccati.com> writes: > I added the error check with a second UPDATE try after INSERT to increase > accuracy. In fact, INSERTs were sometimes failing because of concurrency, and > this was the only viable solution I found to avoid losing data. in the general case you could have someone else delete the record again before you get the chance to update it again. to handle this case you would actually have to make it a loop. this is the main reason a built-in merge command could be nice, it would avoid the need to loop, since it can take the lock on the existing record if it's there or perform the insert if not. -- greg
gsstark@mit.edu (Greg Stark) writes: > This happens most often when I have a list of items and have a UI that allows > the user to edit the entire list and commit a whole new list in one action. > It's much easier to simply delete the old list and insert the entire new list > in a single query than to try to figure out which rows to delete and which to > insert. Right, but this is going to quit working for you as soon as foreign key references are involved. For that matter, other types of dependent objects will prevent this as well. In other words, when your relational database actually becomes relational :-) I too was in the habit of the delete/insert as a lazy mans update on simple schemad DBs. HTH -- ------------------------------------------------------------------------------- Jerry Sievers 305 854-3001 (home) WWW ECommerce Consultant 305 321-1144 (mobile http://www.JerrySievers.com/