Re: Making joins involving ctid work for the benefit of UPSERT - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Making joins involving ctid work for the benefit of UPSERT
Date
Msg-id CAM3SWZQXBWR6UaxcR7fNkUP-bss0zRO2dp7UWofxh1tWO8SoeA@mail.gmail.com
Whole thread Raw
In response to Re: Making joins involving ctid work for the benefit of UPSERT  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Making joins involving ctid work for the benefit of UPSERT
Re: Making joins involving ctid work for the benefit of UPSERT
Re: Making joins involving ctid work for the benefit of UPSERT
List pgsql-hackers
On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> This all seems completely to one side of Andres's point.  I think what
> he's saying is: don't implement an SQL syntax of the form INSERT ON
> CONFLICT and let people use that to implement upsert.  Instead,
> directly implement an SQL command called UPSERT or similar.  That's
> long been my intuition as well.  I think generality here is not your
> friend.

Sure, I was just making one fairly narrow point in relation to Andres'
concern about executor structuring of the UPSERT.

> I'd suggest something like:
>
> UPSERT table SET col = value [, ...] WHERE predicate;

Without dismissing the overall concern shared by you and Andres, that
particular update-driven syntax doesn't strike me as something that
should be pursued. I would like to be able to insert or update more
than a single row at a time, for one thing. For another, what exactly
appears in the predicate? Is it more or less the same as an UPDATE's
predicate?

> with semantics like this:
>
> 1. Search the table, using any type of scan you like, for a row
> matching the given predicate.

Perhaps I've misunderstood, but this is fundamentally different to
what I'd always thought would need to happen. I always understood that
the UPSERT should be insert-driven, and so not really have an initial
scan at all in the same sense that every Insert lacks one. Moreover,
I've always thought that everyone agreed on that. We go to insert, and
then in the course of inserting index tuples do something with dirty
snapshots. That's how we get information on conflicts.

For one thing we cannot use any kind of scan unless there is a new
mechanism for seeing not-yet-visible tuples, like some kind of
non-MVCC snapshot that those scan nodes can selectively use. Now, I
guess that you intend that in this scenario you go straight to 5, and
then your insert finds the not-yet-visible conflict. But it's not
clear that when 5 fails, and we pick up from 1 that we then get a new
MVCC Snapshot or something else that gives us a hope of succeeding
this time around. And it seems quite wasteful to not use knowledge of
conflicts from the insert at all - that's one thing I'm trying to
address here; removing unnecessary index scans when we actually know
what our conflict TIDs are.

> 2. If you find more than one tuple that is visible to your scan, error.

This point seems to concern making the UPSERT UPDATE only affect zero
or one rows. Is that right? If so, why do you think that's a useful
constraint to impose?

> 3. If you find exactly one tuple that is visible to your scan, update it.  Stop.
> 4. If you find no tuple that is visible to your scan, but you do find
> at least one tuple whose xmin has committed and whose xmax has not
> committed, then (4a) if the isolation level is REPEATABLE READ or
> higher, error; (4b) if there is more than one such tuple, error; else
> (4b) update it, in violation of normal MVCC visibility rules.

Point 4b ("if there is more than one such tuple...") seems like it
needs some practical or theoretical justification - do you just not
want to follow an update chain? If so, why? What would the error
message actually be? And, to repeat myself: Why should an MVCC
snapshot see more than one such tuple in the ordinary course of
scanning a relation, since there is by definition a unique constraint
that prevents this? Or maybe I'm wrong; maybe you don't even require
that. That isn't clear.

As you know, I've always opposed any type of READ COMMITTED
serialization failure. If you're worried about lock starvation hazards
- although I don't think strong concerns here are justified - we can
always put in a fail-safe number of retries before throwing an error.
I'm comfortable with that because I think based on experimentation
that it's going to be virtually impossible to hit. Apparently other
snapshot isolation databases acquire a new snapshot, and re-do the
command rather than using an EvalPlanQual()-like mechanism and thereby
violating snapshot isolation. Those other systems have just such a
hard limit on retries before erroring, and it seems to work out okay
for them.

> 5. Construct a new tuple using the col/value pairs from the SET clause
> and try to insert it.  If this would fail with a unique index
> violation, back out and go back to step 1.

Again, I can't see why you'd do this step last and not first, since
this is naturally the place to initially "see into the future" with a
dirty snapshot.

> Having said all that, I believe the INSERT ON CONFLICT syntax is more
> easily comprehensible than previous proposals.  But I still tend to
> agree with Andres that an explicit UPSERT syntax or something like it,
> that captures all of the MVCC games inside itself, is likely
> preferable from a user standpoint, whatever the implementation ends up
> looking like.

Okay then. If you both feel that way, I will come up with something
closer to what you sketch. But for now I still strongly feel it ought
to be driven by an insert. Perhaps I've misunderstood you entirely,
though.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Rohit Goyal
Date:
Subject: Re: Least Active Transaction ID function
Next
From: Tom Lane
Date:
Subject: Re: Shapes on the regression test for polygon