Thread: most idiomatic way to "update or insert"?

most idiomatic way to "update or insert"?

From
Mark Harrison
Date:
So I have some data that I want to put into a table.  If the
row already exists (as defined by the primary key), I would
like to update the row.  Otherwise, I would like to insert
the row.

I've been doing something like

delete from foo where name = 'xx';
insert into foo values('xx',1,2,...);

but I've been wondering if there's a more idiomatic or canonical
way to do this.

Many TIA,
Mark

--
Mark Harrison
Pixar Animation Studios

Re: most idiomatic way to "update or insert"?

From
"Peter Darley"
Date:
Mark,
    It's not canonical by any means, but what I do is:

update foo set thing='stuff' where name = 'xx' and thing<>'stuff';
insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where
not exists (select 1 from foo where name='xx'));

    I believe if you put these on the same line it will be a single
transaction.  It has the benefit of not updating the row if there aren't
real changes.  It's plenty quick too, if name is indexed.

Thanks,
Peter Darley

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Mark Harrison
Sent: Wednesday, August 04, 2004 4:26 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] most idiomatic way to "update or insert"?


So I have some data that I want to put into a table.  If the
row already exists (as defined by the primary key), I would
like to update the row.  Otherwise, I would like to insert
the row.

I've been doing something like

delete from foo where name = 'xx';
insert into foo values('xx',1,2,...);

but I've been wondering if there's a more idiomatic or canonical
way to do this.

Many TIA,
Mark

--
Mark Harrison
Pixar Animation Studios

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings


Re: most idiomatic way to "update or insert"?

From
Duane Lee - EGOVX
Date:

You could always do a SELECT and if the row exists then UPDATE otherwise INSERT.  Or INSERT and if you get an error then UPDATE.

Duane

-----Original Message-----
From: Mark Harrison [mailto:mh@pixar.com]
Sent: Wednesday, August 04, 2004 4:26 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] most idiomatic way to "update or insert"?

So I have some data that I want to put into a table.  If the
row already exists (as defined by the primary key), I would
like to update the row.  Otherwise, I would like to insert
the row.

I've been doing something like

delete from foo where name = 'xx';
insert into foo values('xx',1,2,...);

but I've been wondering if there's a more idiomatic or canonical
way to do this.

Many TIA,
Mark

--
Mark Harrison
Pixar Animation Studios

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: most idiomatic way to "update or insert"?

From
Greg Stark
Date:
I'll mention that often I do exactly what you're doing. I find deleting all
existing records and then inserting what I want to appear to be cleaner than
handling the various cases that can arise if you don't.

This happens most often when I have a list of items and have a UI that allows
the user to edit the entire list and commit a whole new list in one action.
It's much easier to simply delete the old list and insert the entire new list
in a single query than to try to figure out which rows to delete and which to
insert.

--
greg

Re: most idiomatic way to "update or insert"?

From
Lincoln Yeoh
Date:
I don't think that works - there's a race condition if you do not do any
locking.

Why:
Before a transaction that inserts rows is committed, other transactions are
not aware of the inserted rows, so the select returns no rows.

So:
You can either create a unique index and catch insert duplicate failures.

Or:
lock the relevant tables, then do the select ... update/insert  or insert
... select , or whatever it is you want to do.

Or:
both.

Test it out yourself.

At 07:51 AM 8/5/2004, Peter Darley wrote:
>Mark,
>         It's not canonical by any means, but what I do is:
>
>update foo set thing='stuff' where name = 'xx' and thing<>'stuff';
>insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where
>not exists (select 1 from foo where name='xx'));
>
>         I believe if you put these on the same line it will be a single
>transaction.  It has the benefit of not updating the row if there aren't
>real changes.  It's plenty quick too, if name is indexed.
>
>Thanks,
>Peter Darley



Re: most idiomatic way to "update or insert"?

From
Pierre-Frédéric Caillaud
Date:
I use stored procedures :

create function insertorupdate(....)
    UPDATE mytable WHERE ... SET ...
    IF NOT FOUND THEN
        INSERT INTO mytable ...
    END IF;


    You lose flecibility in your request though.

    I wish Postgresql had an INSERT OR UPDATE like MySQL does. So far it's
the only thing that I regret from MySQL.

Re: most idiomatic way to "update or insert"?

From
Richard Huxton
Date:
Mark Harrison wrote:
> I've been doing something like
>
> delete from foo where name = 'xx';
> insert into foo values('xx',1,2,...);
>
> but I've been wondering if there's a more idiomatic or canonical
> way to do this.

The delete+insert isn't quite the same as an update since you might have
foreign keys referring to foo with "ON DELETE CASCADE" - oops, just lost
all your dependant rows. Other people have warned about race conditions
with insert/test/update.

An "update or insert" would be useful sometimes, but it's not always
necessary. Indeed, if I find I don't know whether I'm adding or updating
something I take a long hard look at my design - it ususally means I've
not thought clearly about something.

For a "running total" table it can make more sense to have an entry with
a total of 0 created automatically via a trigger. Likewise with some
other summary tables.

Can you give an actual example of where you need this?

--
   Richard Huxton
   Archonet Ltd

Re: most idiomatic way to "update or insert"?

From
Tommi Maekitalo
Date:
Hi,

I prefer to update and if the number of updated rows equals 0 do an insert. So
in case of update I need only one roundtrip. If insert is far more common in
this case it might be better try insert and catch the error. But I try to
avoid running on an error intentionally.

First delete and then insert works but needs 2 SQL-statements in every case.
And the database need to update indexes at least once. There might be also
problems with cascaded deletes.


Tommi

Am Donnerstag, 5. August 2004 01:25 schrieb Mark Harrison:
> So I have some data that I want to put into a table.  If the
> row already exists (as defined by the primary key), I would
> like to update the row.  Otherwise, I would like to insert
> the row.
>
> I've been doing something like
>
> delete from foo where name = 'xx';
> insert into foo values('xx',1,2,...);
>
> but I've been wondering if there's a more idiomatic or canonical
> way to do this.
>
> Many TIA,
> Mark

Re: most idiomatic way to "update or insert"?

From
"Peter Darley"
Date:
Lincoln,
    It works for me...
    I think what you said is wrong because it updates first (if there is a row
to update), then inserts.  If there is a row to update the insert won't
insert anything.  If there is no row to update the insert inserts a row.
Either way, the insert is the last thing in the transaction.  Plus, as shown
in the code to follow, I have almost this exact thing in my application and
I know that it does work for me. :)

Code (Perl):
$Neo::DB::Neo->do ("UPDATE Sample_Settings SET Value=" . Quote($Args{Value})
. " WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting=" .
Quote($Args{Setting}) . "; INSERT INTO Sample_Settings (Sample_ID, Setting,
Value) (SELECT " . Quote($Args{SampleID}) . ", " . Quote($Args{Setting}) .
", " . Quote($Args{Value}) . " WHERE NOT EXISTS (SELECT 1 FROM
Sample_Settings WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting="
. Quote($Args{Setting}) . "));");

Thanks,
Peter Darley

-----Original Message-----
From: Lincoln Yeoh [mailto:lyeoh@pop.jaring.my]
Sent: Wednesday, August 04, 2004 6:49 PM
To: Peter Darley; Mark Harrison; pgsql-general@postgresql.org
Subject: Re: [GENERAL] most idiomatic way to "update or insert"?


I don't think that works - there's a race condition if you do not do any
locking.

Why:
Before a transaction that inserts rows is committed, other transactions are
not aware of the inserted rows, so the select returns no rows.

So:
You can either create a unique index and catch insert duplicate failures.

Or:
lock the relevant tables, then do the select ... update/insert  or insert
... select , or whatever it is you want to do.

Or:
both.

Test it out yourself.

At 07:51 AM 8/5/2004, Peter Darley wrote:
>Mark,
>         It's not canonical by any means, but what I do is:
>
>update foo set thing='stuff' where name = 'xx' and thing<>'stuff';
>insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where
>not exists (select 1 from foo where name='xx'));
>
>         I believe if you put these on the same line it will be a single
>transaction.  It has the benefit of not updating the row if there aren't
>real changes.  It's plenty quick too, if name is indexed.
>
>Thanks,
>Peter Darley




Re: most idiomatic way to "update or insert"?

From
Mike Mascari
Date:
Peter Darley wrote:
> Lincoln, It works for me... I think what you said is wrong
> because it updates first (if there is a row to update), then
> inserts.  If there is a row to update the insert won't insert
> anything.  If there is no row to update the insert inserts a row.
>  Either way, the insert is the last thing in the transaction.
> Plus, as shown in the code to follow, I have almost this exact
> thing in my application and I know that it does work for me. :)

You're getting lucky. I suggested the same thing four years ago. The
race condition is still there:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=2344.978158285%40sss.pgh.pa.us

HTH,

Mike Mascari


Re: most idiomatic way to "update or insert"?

From
Csaba Nagy
Date:
Peter,

The "does not work" part is not refering to the method not working at
all, but to the fact that it is not safe when you have multiple
transactions operating on the same row at the same time.
There are plenty of discussions about the insert-or-update race
conditions on this list, and the final conclusion was always that it is
not possible to solve the race condition without being prepared to catch
exceptions and retry the whole thing until it succedes...
The reason of the race condition: let's say 2 transactions A and B try
to insert-or-update the same row which does not exist. They do the
update statement at the same time, and BOTH OF THEM gets as a result
that no rows were updated, since the row does not exist yet. Now both
transactions try to insert the row, and obviously one of them will fail.
So your code must be prepared that the insert can fail, and in that case
it should retry with the update.
People tried to devise a method to avoid the race condition and throwing
exception, but it is just not possible.
Now the one bad thing in postgres which people complained about in this
context is that the transaction gets rolled back on any error, so
actually instead of just retrying the update, you will have to redo your
whole transaction.

HTH,
Csaba.


On Thu, 2004-08-05 at 15:28, Peter Darley wrote:
> Lincoln,
>     It works for me...
>     I think what you said is wrong because it updates first (if there is a row
> to update), then inserts.  If there is a row to update the insert won't
> insert anything.  If there is no row to update the insert inserts a row.
> Either way, the insert is the last thing in the transaction.  Plus, as shown
> in the code to follow, I have almost this exact thing in my application and
> I know that it does work for me. :)
>
> Code (Perl):
> $Neo::DB::Neo->do ("UPDATE Sample_Settings SET Value=" . Quote($Args{Value})
> . " WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting=" .
> Quote($Args{Setting}) . "; INSERT INTO Sample_Settings (Sample_ID, Setting,
> Value) (SELECT " . Quote($Args{SampleID}) . ", " . Quote($Args{Setting}) .
> ", " . Quote($Args{Value}) . " WHERE NOT EXISTS (SELECT 1 FROM
> Sample_Settings WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting="
> . Quote($Args{Setting}) . "));");
>
> Thanks,
> Peter Darley
>
> -----Original Message-----
> From: Lincoln Yeoh [mailto:lyeoh@pop.jaring.my]
> Sent: Wednesday, August 04, 2004 6:49 PM
> To: Peter Darley; Mark Harrison; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] most idiomatic way to "update or insert"?
>
>
> I don't think that works - there's a race condition if you do not do any
> locking.
>
> Why:
> Before a transaction that inserts rows is committed, other transactions are
> not aware of the inserted rows, so the select returns no rows.
>
> So:
> You can either create a unique index and catch insert duplicate failures.
>
> Or:
> lock the relevant tables, then do the select ... update/insert  or insert
> ... select , or whatever it is you want to do.
>
> Or:
> both.
>
> Test it out yourself.
>
> At 07:51 AM 8/5/2004, Peter Darley wrote:
> >Mark,
> >         It's not canonical by any means, but what I do is:
> >
> >update foo set thing='stuff' where name = 'xx' and thing<>'stuff';
> >insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where
> >not exists (select 1 from foo where name='xx'));
> >
> >         I believe if you put these on the same line it will be a single
> >transaction.  It has the benefit of not updating the row if there aren't
> >real changes.  It's plenty quick too, if name is indexed.
> >
> >Thanks,
> >Peter Darley
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


Re: most idiomatic way to "update or insert"?

From
"Peter Darley"
Date:
Mike,
    Ahha!  I didn't understand what the objection was.  I guess I am getting
lucky. :)
    It seems to me that this is true with any concurrent inserts, isn't it?
One will succeed and one will fail.
Thanks,
Peter Darley

-----Original Message-----
From: Mike Mascari [mailto:mascarm@mascari.com]
Sent: Thursday, August 05, 2004 6:51 AM
To: Peter Darley
Cc: Lincoln Yeoh; Mark Harrison; pgsql-general@postgresql.org
Subject: Re: [GENERAL] most idiomatic way to "update or insert"?


Peter Darley wrote:
> Lincoln, It works for me... I think what you said is wrong
> because it updates first (if there is a row to update), then
> inserts.  If there is a row to update the insert won't insert
> anything.  If there is no row to update the insert inserts a row.
>  Either way, the insert is the last thing in the transaction.
> Plus, as shown in the code to follow, I have almost this exact
> thing in my application and I know that it does work for me. :)

You're getting lucky. I suggested the same thing four years ago. The
race condition is still there:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=2344.978158285%40sss
.pgh.pa.us

HTH,

Mike Mascari



Re: most idiomatic way to "update or insert"?

From
award@dominionsciences.com
Date:
> An "update or insert" would be useful sometimes, but it's not always
> necessary. Indeed, if I find I don't know whether I'm adding or updating
> something I take a long hard look at my design - it ususally means I've
> not thought clearly about something.
...
> Can you give an actual example of where you need this?

We have an environment where our data collection occurs by screen scraping
(er, web scraping?). Unfortunately, it takes two passes, once across
search results which provide partial data, then a second time over a
detail page loaded for each item in the search results we were given.
Since time is of the essence, we provide the partial data to our
customers, which means dealing with the insert or update. Additionally,
the process is multithreaded, so search results can be touching things
concurrently with details being loaded, otherwise we can't keep up.

I dealt with the problem by wrapping every touch of an item in a single
transaction with a loop around it, as has been recommended here many times
before. Any DB-exception (Python) inside the loop caused by concurrency
type problems causes a restart. As it turns out, the insert/update race
has yet to result in a retry. The real payoff in this design has proven to
be dealing with FK locking... without putting way more effort into fixing
it than the deadlocks are worth, we get around a dozen deadlocks a day
that are automatically retried.

Re: most idiomatic way to "update or insert"?

From
Ron St-Pierre
Date:
Richard Huxton wrote:

>
> An "update or insert" would be useful sometimes, but it's not always
> necessary. Indeed, if I find I don't know whether I'm adding or
> updating something I take a long hard look at my design - it ususally
> means I've not thought clearly about something.


> Can you give an actual example of where you need this?

How about stocks for a simple example? Let's say you have a simple table
with the stock symbol, stock exchange, high, low, open, close and
volume. Every day you update the data for each stock. But there are
always new stocks listed on an exchange, so when a new stock shows up
you have to do an insert instead of an update.

Ron



Re: most idiomatic way to "update or insert"?

From
Richard Huxton
Date:
Ron St-Pierre wrote:
> Richard Huxton wrote:
>
>>
>> An "update or insert" would be useful sometimes, but it's not always
>> necessary. Indeed, if I find I don't know whether I'm adding or
>> updating something I take a long hard look at my design - it ususally
>> means I've not thought clearly about something.
>
>> Can you give an actual example of where you need this?
>
> How about stocks for a simple example? Let's say you have a simple table
> with the stock symbol, stock exchange, high, low, open, close and
> volume. Every day you update the data for each stock. But there are
> always new stocks listed on an exchange, so when a new stock shows up
> you have to do an insert instead of an update.

If it is just a simple table then delete all of them and insert from
scratch. If you wanted to track changes over time (perhaps more likely),
you'd have a separate table with the company name/address etc and a log
table. At which point you'll want to know if it's a new company or not...

--
   Richard Huxton
   Archonet Ltd

Re: most idiomatic way to "update or insert"?

From
Ron St-Pierre
Date:
Richard Huxton wrote:

> Ron St-Pierre wrote:
>
>> Richard Huxton wrote:
>>
>>>
>>> An "update or insert" would be useful sometimes, but it's not always
>>> necessary. Indeed, if I find I don't know whether I'm adding or
>>> updating something I take a long hard look at my design - it
>>> ususally means I've not thought clearly about something.
>>
>>
>>> Can you give an actual example of where you need this?
>>
>>
>> How about stocks for a simple example? Let's say you have a simple
>> table with the stock symbol, stock exchange, high, low, open, close
>> and volume. Every day you update the data for each stock. But there
>> are always new stocks listed on an exchange, so when a new stock
>> shows up you have to do an insert instead of an update.
>
>
> If it is just a simple table then delete all of them and insert from
> scratch. If you wanted to track changes over time (perhaps more
> likely), you'd have a separate table with the company name/address etc
> and a log table. At which point you'll want to know if it's a new
> company or not...
>
Okay, this simple example really exists, but the simple table also
includes a date that the stock was last traded, so we have:
stock symbol, stock exchange, high, low, open, close, volume, date, plus
a few more fields

But the data isn't always updated at one time, as we can update all
stocks for one exhange and possibly only some of the stocks for a
particular exchange in one go. Even if the data is received for only one
exchange we could delete all stocks for that exchange and insert new
ones, which would work fine. However some stocks are not traded every
day, so we need to show the trading information for the last date that
it was traded, so we can't delete them en masse even for the one exchange.

BTW these updates do take longer than we'd like so I would appreciate
more input on how this setup could be redesigned.

Ron




Re: most idiomatic way to "update or insert"?

From
Greg Stark
Date:
Ron St-Pierre <rstpierre@syscor.com> writes:

> BTW these updates do take longer than we'd like so I would appreciate more
> input on how this setup could be redesigned.

Where is the input coming from?

One option is to batch changes. If you just insert into a log table whenever
new data is available, and then do a batch update of many records you would
have a few advantages.

1) You could have a single updater and therefore no worries with concurrency.

2) The optimizer could choose a merge join or at least a nested loop and avoid
   multiple round trips.

Something like

update current_stock_price
   set price = log.price,
       timestamp = log.timestamp
  from stock_price log
 where current_stock_price.stock = stock_price_log.stock
   and stock_price_log.timestamp between ? and ?

You can either just use deterministic time ranges like midnight-midnight or
keep careful track of the last time the job was run.

You would first have to insert into current_stock_price any missing stocks,
but if you're batching them then again you don't have to worry about someone
else inserting them in the middle of your query. And it's more efficient to
add lots of them in one shot than one at a time.

--
greg

Re: most idiomatic way to "update or insert"?

From
Richard Huxton
Date:
Ron St-Pierre wrote:
> Okay, this simple example really exists, but the simple table also
> includes a date that the stock was last traded, so we have:
> stock symbol, stock exchange, high, low, open, close, volume, date, plus
> a few more fields
[snip more details]
> BTW these updates do take longer than we'd like so I would appreciate
> more input on how this setup could be redesigned.

Well, I'd probably make the primary key (stock_id, trading_date) and
just insert into a log table. From there I'd update into a summary
table, or use a view.

Of course, that might make things slower in your case.

--
   Richard Huxton
   Archonet Ltd

Re: most idiomatic way to "update or insert"?

From
Matteo Beccati
Date:
Hi,

> I prefer to update and if the number of updated rows equals 0 do an
> insert. So in case of update I need only one roundtrip. If insert is far
> more common in this case it might be better try insert and catch the
> error. But I try to avoid running on an error intentionally.


When logging to a compact table that stores data in an aggregate form, I
used something like that:

BEGIN;
UPDATE ... ;

if (!affected_rows)
{
     INSERT ... ;

     if (error)
     {
         ROLLBACK;
         UPDATE ... ;
     }
}

COMMIT;

I added the error check with a second UPDATE try after INSERT to
increase accuracy. In fact, INSERTs were sometimes failing because of
concurrency, and this was the only viable solution I found to avoid
losing data.


Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/

Re: most idiomatic way to "update or insert"?

From
Ron St-Pierre
Date:
Greg Stark wrote:

>Ron St-Pierre <rstpierre@syscor.com> writes:
>
>
>
>>BTW these updates do take longer than we'd like so I would appreciate more
>>input on how this setup could be redesigned.
>>
>>
>
>Where is the input coming from?
>
>One option is to batch changes.
>
<snip>

>
>Something like
>
>update current_stock_price
>   set price = log.price,
>       timestamp = log.timestamp
>  from stock_price log
> where current_stock_price.stock = stock_price_log.stock
>   and stock_price_log.timestamp between ? and ?
>
>
We check for new stocks and add them, and initially were using a
procedure to do something similar to your code:

CREATE OR REPLACE FUNCTION updateData() RETURNS SETOF datatype AS '
    DECLARE
        rec     RECORD;
    BEGIN
        FOR rec IN SELECT symbol, tradeDate, tickDate, high, low , open,
close, volume FROM exchangedata LOOP
            RETURN NEXT rec;
            UPDATE stockdata SET high=rec.high, low=rec.low,
open=rec.low, close=rec.close, volume=rec.volume, tradeDate=rec.tradeDate
            WHERE symbol=rec.symbol;
        END LOOP;
        RETURN;
    END;
' LANGUAGE 'plpgsql';
... but it took too long. Off hand, do you know if your approach above
would be quicker?

Ron




Re: most idiomatic way to "update or insert"?

From
Greg Stark
Date:
Matteo Beccati <php@beccati.com> writes:

> I added the error check with a second UPDATE try after INSERT to increase
> accuracy. In fact, INSERTs were sometimes failing because of concurrency, and
> this was the only viable solution I found to avoid losing data.

in the general case you could have someone else delete the record again before
you get the chance to update it again. to handle this case you would actually
have to make it a loop.

this is the main reason a built-in merge command could be nice, it would avoid
the need to loop, since it can take the lock on the existing record if it's
there or perform the insert if not.

--
greg

Re: most idiomatic way to "update or insert"?

From
Jerry Sievers
Date:
gsstark@mit.edu (Greg Stark) writes:
> This happens most often when I have a list of items and have a UI that allows
> the user to edit the entire list and commit a whole new list in one action.
> It's much easier to simply delete the old list and insert the entire new list
> in a single query than to try to figure out which rows to delete and which to
> insert.

Right, but this is going to quit working for you as soon as foreign
key references are involved.  For that matter, other types of
dependent objects will prevent this as well.

In other words, when your relational database actually becomes
relational  :-)

I too was in the habit of the delete/insert as a lazy mans update on
simple schemad DBs.

HTH

--
-------------------------------------------------------------------------------
Jerry Sievers   305 854-3001 (home)     WWW ECommerce Consultant
                305 321-1144 (mobile    http://www.JerrySievers.com/