Re: MERGE SQL Statement - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: MERGE SQL Statement
Date
Msg-id 1208441721.4259.304.camel@ebony.site
Whole thread Raw
In response to Re: MERGE SQL Statement  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Thu, 2008-04-17 at 07:31 +0100, Simon Riggs wrote:

> > > * MERGE will perform a left outer join between source on left and target
> > > on right. There must be no more than 1 row from table-ref for each row
> > > in the table. Each row in the table can only be updated once during each
> > > MERGE statement. Each non-matching row in the table-ref will result in
> > > one INSERT into table.
> > >
> > > * WHEN clauses are tested in the order specified. If the AND condition
> > > returns false then we skip onto the next WHEN clause. We stop once a
> > > WHEN clause activates, so only *one* action is ever activated for each
> > > row.
> > >
> > > * AND clauses need not form a complete set, i.e. it is possible that no
> > > action will result. It is also possible that some WHEN clauses will
> > > never activate because of the execution order; we would not try to
> > > prevent this, just document it as a possible user error.
> > >
> >
> > Just curious if any of these behaviors come from the spec?  or maybe from
> > other databases?  they don't seem unreasonable in general though.
>
> First two come from spec following clarifications from other
> implementations. The last point about the AND clauses not necessarily
> covering 100% of cases follows from the other points.

No, it looks like I missed one line in the standard.

I've spent all day analysing this and writing up test cases, so this
next bit isn't really a reply to you Robert, just lots of additional
detail on this particular area.


In summary, the standard requires us to

1. If MATCHED to more than one source row we are supposed to throw an
error "cardinality violation". However, if there is no target row, yet
would have generated multiple rows had there been one, we do *not* throw
an error.

2. If the source row doesn't match any target row AND there is no WHEN
NOT MATCHING clause that applies, we are then supposed to implicitly
perform an INSERT with DEFAULT VALUES.

Neither of these seem particularly useful behaviours in *all* cases.

For 1, it implies we would need to sort and de-duplicate the output so
that we can throw an error in *all* cases. We would need to do this
because its fairly hard to determine that the set formed from the union
of all WHEN clauses does not cover the universal set. (i.e. its possible
to write a statement that doesn't have a WHEN clause that applies).

If we try to update a row that just got updated we need to throw an
error, otherwise we may allow the Update Halloween problem. So we will
effectively detect this error in most cases anyway.

So for 1, I suggest we don't throw the explicit error as mentioned in
the standard and allow the secondary update error to provide 99% of
required errors. In the case where there was a matching row but no WHEN
clause tat applies, we simply ignore that matching row.

For 2, DB2 allows an additional statement ELSE IGNORE. I suggest we
support the standard for (2), yet also provide an additional option
    WHEN [NOT] MATCHING THEN
        DO NOTHING

which would simply drop the matching or non-matching row and continue
with the next tuple.

My first attempt at a MERGE test case, with simulated output (no, I
haven't written it yet) is attached.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Lessons from commit fest
Next
From: Simon Riggs
Date:
Subject: Re: get rid of psql welcome message