Home > mailing lists

Re: MERGE Specification - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: MERGE Specification
Date	April 22, 2008 10:58:27
Msg-id	1208861984.4259.1125.camel@ebony.site Whole thread Raw
In response to	Re: MERGE Specification (Martijn van Oosterhout <kleptog@svana.org>)
List	pgsql-hackers

Tree view

On Tue, 2008-04-22 at 10:02 +0200, Martijn van Oosterhout wrote:
> On Tue, Apr 22, 2008 at 08:24:58AM +0100, Simon Riggs wrote:
> > The way MERGE works we first test to see if it matches or not, then if
> > not matched we would activate the NOT MATCHED action, which standard
> > says must be an insert. The gap between the two actions allows a race
> > condition to exist. 
> > 
> > We could close the gap by taking a lock on the row when we perform the
> > is-matched test, but that would be expensive for bulk operations. ISTM
> > the lock should be optional. Not sure what the default should be. Input
> > welcome.
> 
> ISTM that if the original select does a SELECT FOR UPDATE then it
> should work fine for UPDATEs since any update with overwrite the xmax
> field anyway.

Yes, agreed, that's what I meant by the lock on the row.

Incidentally, this is essentially the same problem that occurs with
SERIALIZABLE updates.

It should be easy enough to put an optional "LOCK MATCHED ROW" clause
into the MERGE statement, as an extension. The Standard doesn't specify
the lock timing. 

> What you can't do is prevent multiple inserts. Though if its a unique
> index you should be able to do the same trick as normal inserts: create
> the row, try to insert into the index and if that fails fall back to
> doing an update.

The Standard doesn't really allow that. It's either matched or its not. 

MERGE is specifically
1. Match
2. Update or Insert as per step (1), following complex logic

rather than

1. Update
2. if not matched Insert

which is exactly what the MySQL and Teradata upsert statements do, but
only for single row operations, unlike MERGE.

For MERGE, there is no "lets try one of these and if not, I'll switch".
You decide which it is going to be and then do it. Which can fail... 

I guess we could just spin through, re-testing the match each time and
re-initiating an action, but I see problems there also, not least of
which is it violates the standard. That may not be that clever, but
there may be reasons we can't see yet, or reasons that would affect
other implementors. Guidance, please, if anybody sees clearly?

> What you really need for this though is a non-fatal _bt_check_unique so
> you can recover without having a savepoint for every row.

Oracle simply fails in the event of a uniqueness violation, even though
it logs other errors. DB2 fails unconditionally if there is even a
single error. The MySQL and Teradata syntax don't seem to offer any
protection from concurrent inserts either. Teradata and DB2 both use
locking, so they would lock the value prior to the update anyway, so the
update, insert issue would not happen for them at least.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com

pgsql-hackers by date:

From: Martijn van Oosterhout
Date: 22 April 2008, 10:55:39
Subject: Re: Regression test fails when BLCKSZ is 1kB

From: Zdenek Kotala
Date: 22 April 2008, 11:12:22
Subject: Re: Regression test fails when BLCKSZ is 1kB

Re: MERGE Specification - Mailing list pgsql-hackers

Previous

Next