Re: UPSERT - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: UPSERT
Date
Msg-id 1172858409.3760.1618.camel@silverbirch.site
Whole thread Raw
In response to Re: UPSERT  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: UPSERT  (Bricklen Anderson <banderson@presinet.com>)
Re: UPSERT  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, 2007-03-02 at 15:41 +0000, Heikki Linnakangas wrote:
> Tom Lane wrote:
> > Andrew Dunstan <andrew@dunslane.net> writes:
> >> My instinct would be to follow your first strategy, i.e. detect which 
> >> path is needed rather than try one and then if it fails do the other.
> > 
> > The very first thing you need to think about is how to solve the race
> > condition problem, ie, two backends concurrently trying to insert
> > identical data.  Until you have a plausible mechanism for that, the
> > whole thing is pie-in-the-sky.
> 
> How about:
> 
> 1. Insert new heap tuple
> 2. Try to insert the index tuple. If there's a duplicate tuple, lock the 
> existing tuple instead of throwing an error.
> 3. If there was no duplicate, we're done.
> 
> 4. Otherwise, kill the new tuple inserted in step 1, by setting it's 
> xmin to InvalidTransactionId.
> 5. Perform the UPDATE on the existing tuple.
> 
> This requires one change to the indexam api: a duplicate key violation 
> needs to lock the existing tuple instead of throwing an error.

So if the INSERT fails we will leave two dead copies of the tuples? Hmm.

Seems like we should try to locate a row first, then INSERT if we cannot
find one. That's slower on INSERT but more balanced overall - sometimes
the input will generate all UPDATEs, sometimes all INSERTs we'll never
know.


I'm a bit surprised the TODO didn't mention the MERGE statement, which
is the SQL:2003 syntax for specifying this as an atomic statement. There
are lots of other syntaxes, the most simple of which are the MySQL
REPLACE and Teradata's UPDATE ... ELSE INSERT. As seductive as they are,
I'd say that's all the more reason to go with the single approved
syntax. If MySQL are standards compliant, they will support that also,
so we get MySQL compatibility either way.

Another thought that really ought to be on the TODO is a MERGE FROM
(pick your syntax) that allows MERGE to act like a COPY, reading data
from an external data file. That would save effort, since the only way
of doing this currently is to do a COPY then an UPDATE and then an
INSERT. So the MERGE FROM would reduce three operations to just a single
command. 

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Patch license update to developer's FAQ
Next
From: Bricklen Anderson
Date:
Subject: Re: UPSERT