Re: MERGE Specification - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: MERGE Specification
Date
Msg-id 1209376143.4391.43.camel@ebony.site
Whole thread Raw
In response to Re: MERGE Specification  ("Marko Kreen" <markokr@gmail.com>)
List pgsql-hackers
On Mon, 2008-04-28 at 11:57 +0300, Marko Kreen wrote:
> On 4/25/08, Robert Treat <xzilla@users.sourceforge.net> wrote:
> > On Thursday 24 April 2008 23:40, Tom Lane wrote:
> >  > Robert Treat <xzilla@users.sourceforge.net> writes:
> >  > > Perhaps a better option would be to implement Merge per spec, and then
> >  > > implement a "replace into" command for the oltp scenario.  This way you
> >  > > keep the spec behavior for the spec syntax, and have a clearly non-spec
> >  > > command for non-spec behavior.
> >  >
> >  > In that case, it's a fair question to ask just who will use the "spec"
> >  > syntax.  As far as I can tell from years of watching the mailing lists,
> >  > there is plenty of demand for a concurrent-safe insert-or-update
> >  > behavior, and *exactly zero* demand for the other.  I challenge you to
> >  > find even one request for the "spec" behavior in the mailing list
> >  > archives.  (Simon doesn't count.)
> >  >
> >
> >
> > AIUI the current implementation is designed to avoid race conditions partially
> >  at the cost of being insert friendly and somewhat update unfriendly. My guess
> >  is that most of the people wanting this for OLTP use want an update friendly
> >  implementation (that's certainly been the majority of cases I've needed
> >  myself, and that I have seen others use).
> 
> This seems to hint that there should be 2 variants of merge/upsert
> - insert-friendly and update-friendly...  It seems unlikely one implementation
> can be both.  And especially bad would be implementation that is neither.

Not sure what an option that was "neither" would look like ...

I would summarise the two MERGE behaviour proposals as

1. Correctly protects against concurrent inserts. Uses one
sub-transaction per row and leaves 2 dead rows per update. Requires us
to perform tasks in different order than required by SQL spec, but the
end result seems identical to me (now).
Has been noted as suitable for OLTP, and poor for bulk data maintenance.
Has been described as "insert-friendly" and "non-spec".

2. Does not protect against concurrent inserts. Leaves 1 dead row per
update. Much more efficient for updates, not sure about any efficiency
gain for inserts.
Has been noted as being unsuitable for OLTP, though likely to offer more
acceptable performance for bulk operations.
Has been described as "update-friendly".

By consensus, I'm doing (1). 

It looks likely that doing (2) should be fairly small change and so can
be offered as an option. For example, we can have an additional
action-order clause with two options (the first of which is default)
[INSERT BEFORE UPDATE ACTION ORDER | DEFAULT ACTION ORDER]
So the default is to force inserts to occur before updates, as required
by (1). The other option "DEFAULT ACTION ORDER" tests the WHEN clauses
in the order specified in the statement, allowing the user to choose
whether they want to test for updates or inserts first.

Overall, the difference between these behaviours is small in comparison
with making MERGE work in the first place...

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Proposed patch - psql wraps at window width
Next
From: "dv @ nabble"
Date:
Subject: SRF in SFRM_ValuePerCall mode