Re: [HACKERS] MERGE SQL Statement for PG11 - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [HACKERS] MERGE SQL Statement for PG11
Date
Msg-id CANP8+jJS0G8BOz0AdLXyXdvgYeYZ0Ht=ukfMrgs2vJRrDf+QzA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] MERGE SQL Statement for PG11  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] MERGE SQL Statement for PG11  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On 6 November 2017 at 16:50, Peter Geoghegan <pg@bowt.ie> wrote:

>> Where hides the problem?
>
>
> The problem is violating MVCC is something that can be done in different
> ways, and by meaningful degrees:
>
> * EPQ semantics are believed to be fine because we don't get complaints
>  about it. I think that that's because it's specialized to UPDATEs and
>  UPDATE-like operations, where we walk an UPDATE chain specifically,
>  and only use a dirty snapshot for the chain's newer tuples.
>
> * ON CONFLICT doesn't care about UPDATE chains. Unlike EPQ, it makes no
>  distinction between a concurrent UPDATE, and a concurrent DELETE + fresh
>  INSERT. It's specialized to CONFLICTs.
>
> This might seem abstract, but it has real, practical implications.
> Certain contradictions exist when you start with MVCC semantics, then
> fall back to EPQ semantics, then finally fall back to ON CONFLICT
> semantics.
>
> Questions about mixing these two things:
>
> * What do we do if someone concurrently UPDATEs in a way that makes the
>  qual not pass during EPQ traversal? Should we INSERT when that
>  happens?
>
> * If so, what about the case when the MERGE join qual/unique index
>  values didn't change (just some other attributes that do not pass the
>  additional WHEN MATCHED qual)?
>
> * What about when there was a concurrent DELETE -- should we INSERT then?
>
> ON CONFLICT goes from a CONFLICT, and then applies its own qual. That's
> hugely different to doing it the other way around: starting from your
> own MVCC snapshot qual, and going to a CONFLICT. This is because
> evaluating the DO UPDATE's WHERE clause is just one little extra step
> after the one and only latest row for that value has been locked.  You
> could theoretically go this way with 2PL, I think, because that's a bit
> like locking every row that the predicate touches, but of course that
> isn't at all practical.
>
> I should stop trying to make a watertight case against this, even though
> I still think that's possible. For now, instead, I'll just say that this
> is *extremely* complicated, and still has unresolved questions about
> semantics.

That's a good place to leave this for now - we're OK to make progress
with the main feature, and we have some questions to be addressed once
we have a cake to decorate.

Thanks for your input.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] parallelize queries containing initplans
Next
From: Jesper Pedersen
Date:
Subject: Re: [HACKERS] Proposal: Local indexes for partitioned table