Re: [HACKERS] MERGE SQL Statement for PG11 - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: [HACKERS] MERGE SQL Statement for PG11 |
Date | |
Msg-id | 20171106215003.GA30357@marmot Whole thread Raw |
In response to | Re: [HACKERS] MERGE SQL Statement for PG11 (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: [HACKERS] MERGE SQL Statement for PG11
|
List | pgsql-hackers |
Simon Riggs <simon@2ndquadrant.com> wrote: >In step 3 we discover that an entry exists in the index for a committed row. > >Since we have a unique index we use it to locate the row we know >exists and UPDATE that. > >We don't use a new MVCC snapshot, we do what EPQ does. EPQ is already >violating MVCC for UPDATEs, so why does it matter if we do it for >INSERTs also? Before I go on to say why I think that this approach is problematic, I want to point out a few things that I think we actually agree on: * EPQ is fairly arbitrary as a behavior for READ COMMITTED UPDATE conflict handling. It has more to do with how VACUUM worksthan about some platonic ideal that everyone agrees on. * We can imagine other alternatives, such as the behavior in Oracle (statement level rollback + optimistic retry). * Those alternatives are probably better in some ways but worse in other ways. * EPQ violates snapshot consistency, even though that's not inherently necessary to avoid "READ COMMITTED serialization errors". * ON CONFLICT also violates snapshot consistency, in rather a different way. (Whether or not this is necessary is more debatable.) I actually think that other MVCC systems don't actually copy Oracle here, either, and for similar pragmatic reasons. It's a mixed bag. >Where hides the problem? The problem is violating MVCC is something that can be done in different ways, and by meaningful degrees: * EPQ semantics are believed to be fine because we don't get complaints about it. I think that that's because it's specializedto UPDATEs and UPDATE-like operations, where we walk an UPDATE chain specifically, and only use a dirty snapshotfor the chain's newer tuples. * ON CONFLICT doesn't care about UPDATE chains. Unlike EPQ, it makes no distinction between a concurrent UPDATE, and a concurrentDELETE + fresh INSERT. It's specialized to CONFLICTs. This might seem abstract, but it has real, practical implications. Certain contradictions exist when you start with MVCC semantics, then fall back to EPQ semantics, then finally fall back to ON CONFLICT semantics. Questions about mixing these two things: * What do we do if someone concurrently UPDATEs in a way that makes the qual not pass during EPQ traversal? Should we INSERTwhen that happens? * If so, what about the case when the MERGE join qual/unique index values didn't change (just some other attributes thatdo not pass the additional WHEN MATCHED qual)? * What about when there was a concurrent DELETE -- should we INSERT then? ON CONFLICT goes from a CONFLICT, and then applies its own qual. That's hugely different to doing it the other way around: starting from your own MVCC snapshot qual, and going to a CONFLICT. This is because evaluating the DO UPDATE's WHERE clause is just one little extra step after the one and only latest row for that value has been locked. You could theoretically go this way with 2PL, I think, because that's a bit like locking every row that the predicate touches, but of course that isn't at all practical. I should stop trying to make a watertight case against this, even though I still think that's possible. For now, instead, I'll just say that this is *extremely* complicated, and still has unresolved questions about semantics. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: