Re: [HACKERS] MERGE SQL Statement for PG11 - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: [HACKERS] MERGE SQL Statement for PG11 |
Date | |
Msg-id | 20171101182032.GA7104@marmot Whole thread Raw |
In response to | Re: [HACKERS] MERGE SQL Statement for PG11 (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: [HACKERS] MERGE SQL Statement for PG11
|
List | pgsql-hackers |
On Wed, Nov 1, 2017 at 10:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> The problem here is: Iff the first statement uses ON CONFLICT >> infrastructure, doesn't the absence of WHEN NOT MATCHED imply >> different semantics for the remaining updates and deletes in the >> second version of the query? > > Not according to the SQL Standard, no. I have no plans for such > differences to exist. > > Spec says: If we hit HeapTupleSelfUpdated then we throw an ERROR. Your documentation said that the MERGE was driven by a speculative insertion (BTW, I don't think that this internal implementation detail should be referenced in user-facing docs). I inferred that that could not always be true, since there won't always be an INSERT/WHEN NOT MATCHED case, assuming that you allow that at all (I now gather that you will). >> You've removed what seems like a neat >> adjunct to the MERGE, but it actually changes everything else too when >> using READ COMMITTED. Isn't that pretty surprising? > > I think you're presuming things I haven't said and don't mean, so > we're both surprised. You're right -- I'm surmising what I think might be true, because I don't have the information available to know one way or the other. As far as this issue with using speculative insertions in one context but not in another goes, I still don't really know where you stand. I can still only surmise that you must want both implementations, and will use one or the other as circumstances dictate (to avoid dup violations in the style of ON CONFLICT where that's possible). This seems true because you now say that it will be possible to omit WHEN NOT MATCHED, and yet there is no such thing as a speculative insertion without the insertion. You haven't said that that conclusion is true yourself, but it's the only conclusion that I can draw based on what you have said. > I think we need some way of expressing the problems clearly. It's certainly hard to talk about these problems. I know this from experience. > "a general purpose solution is one that more or > less works like an UPDATE FROM, with an outer join, whose ModifyTable > node is capable of insert, update, or delete (and accepts quals for > MATCHED and NOT matched cases, etc). You could still get duplicate > violations due to concurrent activity in READ COMMITTED mode". > > Surely the whole point of this is to avoid duplicate violations due to > concurrent activity? Now we're getting somewhere. I *don't* think that that's the whole point of MERGE. No other MERGE implementation does that, or claims to do that. The SQL standard says nothing about this. Heikki found this to be acceptable when working on the GSoC MERGE implementation that went nowhere. My position is that we ought to let MERGE be MERGE, and let ON CONFLICT be ON CONFLICT. In Postgres, you can avoid duplicate violations with MERGE by using a higher isolation level (these days, those are turned into a serialization error at higher isolation levels when no duplicate is visible to the xact's snapshot). MERGE isn't and shouldn't be special when it comes to concurrency. > I'm not seeing how either design sketch rigorously avoids live locks, > but those are fairly unlikely and easy to detect and abort. My MERGE semantics (which really are not mine at all) avoid live lock/lock starvation by simply never retrying anything without making forward progress. MERGE doesn't take any special interest in concurrency, just like any other DML statement that isn't INSERT with ON CONFLICT. ON CONFLICT would have live locks if it didn't always have the choice of inserting [1]. In many ways, the syntax of INSERT ON CONFLICT DO UPDATE is restricted in exactly the way it needs to be in order to function correctly. It wasn't an accident that it didn't end up being UPDATE ... ON NOUPDATE DO INSERT, or something like that, which Robert proposed at one point. ON CONFLICT plays by its own rules to a certain extent, because that's what you need in order to get the desired guarantees in READ COMMITTED mode [2]. This is the main reason why it was as painful a project as it was. Further generalizing that seems fraught with difficulties. It seems logically impossible to generalize it in a way where you don't end up with two behaviors masquerading as one. [1] https://wiki.postgresql.org/wiki/UPSERT#Theoretical_lock_starvation_hazards [2] https://wiki.postgresql.org/wiki/UPSERT#Goals_for_implementation -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: