Re: [HACKERS] MERGE SQL Statement for PG11 - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] MERGE SQL Statement for PG11
Date
Msg-id 20171101182032.GA7104@marmot
Whole thread Raw
In response to Re: [HACKERS] MERGE SQL Statement for PG11  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: [HACKERS] MERGE SQL Statement for PG11
List pgsql-hackers
On Wed, Nov 1, 2017 at 10:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> The problem here is: Iff the first statement uses ON CONFLICT
>> infrastructure, doesn't the absence of WHEN NOT MATCHED imply
>> different semantics for the remaining updates and deletes in the
>> second version of the query?
>
> Not according to the SQL Standard, no. I have no plans for such
> differences to exist.
>
> Spec says: If we hit HeapTupleSelfUpdated then we throw an ERROR.

Your documentation said that the MERGE was driven by a speculative
insertion (BTW, I don't think that this internal implementation detail
should be referenced in user-facing docs). I inferred that that could
not always be true, since there won't always be an INSERT/WHEN NOT
MATCHED case, assuming that you allow that at all (I now gather that you
will).

>> You've removed what seems like a neat
>> adjunct to the MERGE, but it actually changes everything else too when
>> using READ COMMITTED. Isn't that pretty surprising?
>
> I think you're presuming things I haven't said and don't mean, so
> we're both surprised.

You're right -- I'm surmising what I think might be true, because I
don't have the information available to know one way or the other. As
far as this issue with using speculative insertions in one context but
not in another goes, I still don't really know where you stand. I can
still only surmise that you must want both implementations, and will use
one or the other as circumstances dictate (to avoid dup violations in
the style of ON CONFLICT where that's possible).

This seems true because you now say that it will be possible to omit
WHEN NOT MATCHED, and yet there is no such thing as a speculative
insertion without the insertion. You haven't said that that conclusion
is true yourself, but it's the only conclusion that I can draw based on
what you have said.

> I think we need some way of expressing the problems clearly.

It's certainly hard to talk about these problems. I know this from
experience.

> "a general purpose solution is one that more or
> less works like an UPDATE FROM, with an outer join, whose ModifyTable
> node is capable of insert, update, or delete (and accepts quals for
> MATCHED and NOT matched cases, etc). You could still get duplicate
> violations due to concurrent activity in READ COMMITTED mode".
>
> Surely the whole point of this is to avoid duplicate violations due to
> concurrent activity?

Now we're getting somewhere.

I *don't* think that that's the whole point of MERGE. No other MERGE
implementation does that, or claims to do that. The SQL standard says
nothing about this. Heikki found this to be acceptable when working on
the GSoC MERGE implementation that went nowhere.

My position is that we ought to let MERGE be MERGE, and let ON CONFLICT
be ON CONFLICT.

In Postgres, you can avoid duplicate violations with MERGE by using a
higher isolation level (these days, those are turned into a
serialization error at higher isolation levels when no duplicate is
visible to the xact's snapshot).  MERGE isn't and shouldn't be special
when it comes to concurrency.

> I'm not seeing how either design sketch rigorously avoids live locks,
> but those are fairly unlikely and easy to detect and abort.

My MERGE semantics (which really are not mine at all) avoid live
lock/lock starvation by simply never retrying anything without making
forward progress. MERGE doesn't take any special interest in
concurrency, just like any other DML statement that isn't INSERT with ON
CONFLICT.

ON CONFLICT would have live locks if it didn't always have the choice of
inserting [1]. In many ways, the syntax of INSERT ON CONFLICT DO UPDATE
is restricted in exactly the way it needs to be in order to function
correctly. It wasn't an accident that it didn't end up being UPDATE ...
ON NOUPDATE DO INSERT, or something like that, which Robert proposed at
one point.

ON CONFLICT plays by its own rules to a certain extent, because that's
what you need in order to get the desired guarantees in READ COMMITTED
mode [2]. This is the main reason why it was as painful a project as it
was. Further generalizing that seems fraught with difficulties. It seems
logically impossible to generalize it in a way where you don't end up
with two behaviors masquerading as one.

[1] https://wiki.postgresql.org/wiki/UPSERT#Theoretical_lock_starvation_hazards
[2] https://wiki.postgresql.org/wiki/UPSERT#Goals_for_implementation
--
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: [HACKERS] proposal: schema variables
Next
From: David Christensen
Date:
Subject: Re: [HACKERS] [PATCH] Add two-arg for of current_setting(NAME,FALLBACK)