Re: MERGE Specification - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: MERGE Specification
Date
Msg-id 1208861984.4259.1125.camel@ebony.site
Whole thread Raw
In response to Re: MERGE Specification  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
On Tue, 2008-04-22 at 10:02 +0200, Martijn van Oosterhout wrote:
> On Tue, Apr 22, 2008 at 08:24:58AM +0100, Simon Riggs wrote:
> > The way MERGE works we first test to see if it matches or not, then if
> > not matched we would activate the NOT MATCHED action, which standard
> > says must be an insert. The gap between the two actions allows a race
> > condition to exist. 
> > 
> > We could close the gap by taking a lock on the row when we perform the
> > is-matched test, but that would be expensive for bulk operations. ISTM
> > the lock should be optional. Not sure what the default should be. Input
> > welcome.
> 
> ISTM that if the original select does a SELECT FOR UPDATE then it
> should work fine for UPDATEs since any update with overwrite the xmax
> field anyway.

Yes, agreed, that's what I meant by the lock on the row.

Incidentally, this is essentially the same problem that occurs with
SERIALIZABLE updates.

It should be easy enough to put an optional "LOCK MATCHED ROW" clause
into the MERGE statement, as an extension. The Standard doesn't specify
the lock timing. 

> What you can't do is prevent multiple inserts. Though if its a unique
> index you should be able to do the same trick as normal inserts: create
> the row, try to insert into the index and if that fails fall back to
> doing an update.

The Standard doesn't really allow that. It's either matched or its not. 

MERGE is specifically
1. Match
2. Update or Insert as per step (1), following complex logic

rather than

1. Update
2. if not matched Insert

which is exactly what the MySQL and Teradata upsert statements do, but
only for single row operations, unlike MERGE.

For MERGE, there is no "lets try one of these and if not, I'll switch".
You decide which it is going to be and then do it. Which can fail... 

I guess we could just spin through, re-testing the match each time and
re-initiating an action, but I see problems there also, not least of
which is it violates the standard. That may not be that clever, but
there may be reasons we can't see yet, or reasons that would affect
other implementors. Guidance, please, if anybody sees clearly?

> What you really need for this though is a non-fatal _bt_check_unique so
> you can recover without having a savepoint for every row.

Oracle simply fails in the event of a uniqueness violation, even though
it logs other errors. DB2 fails unconditionally if there is even a
single error. The MySQL and Teradata syntax don't seem to offer any
protection from concurrent inserts either. Teradata and DB2 both use
locking, so they would lock the value prior to the update anyway, so the
update, insert issue would not happen for them at least.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Regression test fails when BLCKSZ is 1kB
Next
From: Zdenek Kotala
Date:
Subject: Re: Regression test fails when BLCKSZ is 1kB