Re: [HACKERS] MERGE SQL Statement for PG11 - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: [HACKERS] MERGE SQL Statement for PG11
Date
Msg-id CABOikdNnJcrEbbZ6+9PGP=czFOi7Wz46T0mK8Cr0KHngoWJT6w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] MERGE SQL Statement for PG11  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] MERGE SQL Statement for PG11  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers


On Thu, Mar 1, 2018 at 3:50 AM, Peter Geoghegan <pg@bowt.ie> wrote:
On Fri, Feb 9, 2018 at 6:36 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Here's my $0.02: I think that new concurrency errors thrown by the
> merge code itself deserve strict scrutiny and can survive only if they
> have a compelling justification, but if the merge code happens to
> choose an action that leads to a concurrency error of its own, I think
> that deserves only mild scrutiny.
>
> On that basis, of the options I listed in
> http://postgr.es/m/CA+TgmoZDL-caukHkWet7sr7sqr0-e2T91+DEvhqeN5sfqsMjqw@mail.gmail.com
> I like #1 least.
>
> I also dislike #4 from that list for the reasons stated there.  For
> example, if you say WHEN MATCHED AND x.some_boolean and then WHEN
> MATCHED, you expect that every tuple that hits the latter clause will
> have that Boolean as false or null, but #4 makes that not true.
>
> I think the best options are #2 and #5 -- #2 because it's simple, and
> #5 because it's (maybe) more intuitive, albeit at the risk of
> livelock.  But I'm willing to convinced of some other option; like
> you, I'm willing to be open-minded about this.  But, as you say, we
> need a coherent, well-considered justification for the selected
> option, not just "well, this is what we implemented so you should like
> it".  The specification should define the implementation, not the
> reverse.

At first I hated option #2. I'm warming to #2 a lot now, though,
because I've come to understand the patch's approach a little better.
(Pavan and Simon should still verify that I got things right in my
mail from earlier today, though.)

It now seems like the patch throws a RC serialization error more or
less only due to concurrent deletions (rarely, it will actually be a
concurrent update that changed the join qual values of our MERGE).
We're already not throwing the error (we just move on to the next
input row from the join) when we happen to not have a WHEN NOT MATCHED
case. But why even make that distinction? Why not just go ahead and
WHEN NOT MATCHED ... INSERT when the MERGE happens to have such a
clause? The INSERT will succeed, barring any concurrent conflicting
insertion by a third session -- a hazard that has nothing to do with
RC conflict handling in particular.

Just inserting without throwing a RC serialization error is almost
equivalent to a new MVCC snapshot being acquired due to a RC conflict,
so all of this now looks okay to me. Especially because every other
MERGE implementation seems to have serious issues with doing anything
too fancy with the MERGE target table's quals within the main ON join.
I think that SQL Server actually assumes that you're using the
target's PK in a simple equi-join. All the examples look like that,
and this assumption is heavily suggested by the "Do not attempt to
improve query performance by filtering out rows in the target table in
the ON clause" weasel words from their docs, that I referenced in my
mail from earlier today.


I think you've fairly accurately described what the patch does today. I take your point that we can very well just execute the WHEN NOT MATCHED action if the join condition fails for the updated tuple. There is one case we ought to think about though and that might explain why executing the WHEN NOT MATCHED action may not be best choice. Or for that matter even skipping the target when no NOT MATCHED action exists, as the patch does today.

What if the updated tuple fails the join qual with respect to the current tuple from the source relation but it now matches some other tuple from the source relation? I described this case in one of the earlier emails too. In this case, we might end up doing an INSERT (if we decide to execute WHEN NOT MATCHED action), even though a MATCH exists. If there is no WHEN NOT MATCHED action, the current patch will just skip the updated tuple even though a match exists, albeit it's not the current source tuple.

Oracle behaves differently and it actually finds a new matching tuple from the source relation and executes the WHEN MATCHED action, using that source tuple. But I am seriously doubtful that we want to go down that path and whether it's even feasible. Our regular UPDATE .. FROM does not do that either. Given that, it seems better to just throw an error (even when no NOT MATCHED action exists) and explain to the users that MERGE will work as long as concurrent updates don't modify the columns used in the join condition. Concurrent deletes should be fine and we may actually even invoke WHEN NOT MATCHED action in that case.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: [HACKERS] [PATCH] Incremental sort
Next
From: Etsuro Fujita
Date:
Subject: Re: inserts into partitioned table may cause crash