Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0 - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
Date
Msg-id CAM3SWZRB6s6ydBfHQafNhnq2nxnBh2a5fCvLvUzmFdSz9GYjNg@mail.gmail.com
Whole thread Raw
In response to INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0  (Peter Geoghegan <pg@heroku.com>)
Responses Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0  (Bruce Momjian <bruce@momjian.us>)
Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Tue, Feb 2, 2015 at 01:06 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> A first (not actually that quick :() look through the patches to see
> what actually happened in the last months. I didn't keep up with the
> thread.

So, let me get this out of the way: This is the first in-depth
technical review that this work has had in a long time. Thank you for
your help here.

> Generally the split into the individual commits doesn't seem to make
> much sense to me. The commits individually (except the first) aren't
> indivdiually commitable and aren't even meaningful. Splitting off the
> internal docs, tests and such actually just seems to make reviewing
> harder because you miss context. Splitting it so that individual piece
> are committable and reviewable makes sense, but... I have no problem
> doing the user docs later. If you split of RLS support, you need to
> throw an error before it's implemented.

I mostly agree. Basically, I did not intend for all of the patches to
be individually committed. The mechanism by which EXCLUDED.*
expressions are added is somewhat novel, and deserves to be
independently *considered*. I'm trying to show how the parts fit
together more so than breaking things down in to smaller commits (as
you picked up on, 0001 is the exception - that is genuinely intended
to be committed early). Also, those commit messages give me the
opportunity to put those parts in their appropriate context vis-a-vis
our discussions. They refer to the Wiki, for example, or reasons why
pg_stat_statements shouldn't care about ExcludedExpr. Obviously the
final commit messages won't look that way.

> 0001:
> * References INSERT with ON CONFLICT UPDATE, can thus not be committed
>   independently. I don't think those references really are needed.
> * I'm not a fan of the increased code duplication in
>   ExecCheckRTEPerms(). Can you look into cleaning that up?
> * Also the comments inside the ACL_INSERT case still reference UPDATE.
>
> Other than that I think we can just go ahead and commit this ahead of
> time. Mentioning ON CONFLICT UPDATE (OCU henceforth) in the commit
> message only.

Cool. Attached revision makes those changes.

> 0007:
> * "AMs" alone isn't particularly unique.
> * Without the context of the discussion "unprincipled deadlocks" aren't
>   well defined.
> * Too many "" words.
> * Waiting "too long" isn't defined. Neither is why that'd imply
>   unprincipled deadlocks. Somewhat understandable with the context of
>   the discussion, but surely not a couple years down the road.
> * As is I don't find the README entry super helpful. It should state
>   what the reason for doing this is cleary, start at the higher level,
>   and then move to the details.
> * Misses details about the speculative heavyweight locking of tuples.

Fair points. I'll work through that feedback.

Actually, I think we should memorialize that "unprincipled deadlocks"
should be avoided in some more general way, since they are after all a
general problem that we've seen elsewhere. I'm not sure about how to
go about doing that, though.

> 0002:
> * Tentatively I'd say that killspeculative should be done via a separate
>   function instead of heap_delete()

Really? I guess if that were to happen, it would entail refactoring
heap_delete() to call a static function, which was also called by a
new kill_speculative() function that does this. Otherwise, you'd have
far too much duplication.

> * I think we should, as you ponder in a comment, do the OCU specific
>   stuff lazily and/or in a separate function from BuildIndexInfo(). That
>   function is already quite visible in profiles, and the additional work
>   isn't entirely trivial.

Okay.

> * I doubt logical decoding works with the patch as it stands.

I thought so. Perhaps you could suggest a better use of the available
XLOG_HEAP_* bits. I knew I needed to consider that more carefully
(hence the XXX comment), but didn't get around to it.

> * The added ereport (i.e. user facing) error message in
>   ExecInsertIndexTuples won't help a user at all.

So, this:

>>    /* Skip this index-update if the predicate isn't satisfied */
>>    if (!ExecQual(predicate, econtext, false))
>> + {
>> +     if (arbiterIdx == indexRelation->rd_index->indexrelid)
>> +         ereport(ERROR,
>> +                      (errcode(ERRCODE_TRIGGERED_ACTION_EXCEPTION),
>> +                       errmsg("partial arbiter unique index has predicate that does not cover tuple proposed for
insertion"),
>> +                       errdetail("ON CONFLICT inference clause implies that the tuple proposed for insertion
actuallybe covered by partial predicate for index \"%s\".", 
>> +                                     RelationGetRelationName(indexRelation)),
>> +                       errhint("ON CONFLICT inference clause must infer a unique index that covers the final tuple,
afterBEFORE ROW INSERT triggers fire."), 
>> +                       errtableconstraint(heapRelation,
>> +                                                    RelationGetRelationName(indexRelation))));
>>        continue;
>> + }

Yeah, that isn't a great error message. This happens here because you
are using a partial unique index (and so you must have had an
inference specification with a "WHERE" to get here). However, what you
actually went to insert would not be covered by this partial unique
index, and so couldn't ever take the alternative path, and so is
likely not thought out. Maybe it would be better to silently always
let the INSERT succeed as an INSERT. *That* actually wasn't really
discussed - this is all my idea.

> * Personally I don't care one iota for comments like "Get information
>   from the result relation info structure.". Yes, one of these already
>   exists, but ...

Okay.

> * If a arbiter index is passed to ExecCheckIndexConstraints(), can't we
>   abort the loop after checking it? Also, do we really have to iterate
>   over indexes for that case? How about moving the loop contents to a
>   separate function and using that separately for the arbiter cases?

Well, the failure to do that implies very few extra cycles, but sure.
I'll add a new reason to break at the end, when
check_exclusion_or_unique_constraint() is called in respect of a
particular (inferred) arbiter unique index.

> * Don't like the comment above check_exclusion_or_unique_constraint()'s
>   much. Makes too much of a special case of OCU

I guess I should just refer to speculative insertion.

> * ItemPointerIsValid

What about it?

> * ExecCheckHeapTupleVisible's comment "It is not acceptable to proceed "
>   sounds like you're talking with a child or so ;)

Fair point. I should say "It would not be consistent with the
guarantees of the higher isolation levels..."

> * ExecCheckHeapTupleVisible()'s errhint() sounds like an
>   argument/excuse (actually like a code comment). That's not going to
>   help a user at all.

Really? I thought it might be less than intuitive that higher
isolation levels cannot decide to do nothing on the basis of something
not in their MVCC snapshot. But come to think of it, yeah, that
errhint() isn't adding much over the main error message.

> * I find the modified control flow in ExecInsert() pretty darn ugly. I
>   think this needs to be cleaned up. The speculative case should imo be
>   a separate function or something.

Hmm. I'm not quite sold on that. Basically, if we did that, we'd still
have a function that was more or less a strict superset of
ExecInsert(). What have we saved?

What I do agree with is that ExecInsert() should be refactored to make
the common case (a vanilla insert) look like the common case, whereas
the uncommon case (an upsert) should have that dealt with specially.
There is room for improvement. Is that a fair compromise?

> * /*
>    * This may occur when an instantaneously invisible tuple is blamed
>    * as a conflict because multiple rows are inserted with the same
>    * constrained values.
>    How can this happen? We don't insert multiple rows with the same
>    command id?

This is a cardinality violation [1]. It can definitely happen - just
try the examples you see on the Wiki. This is possible because I
modified heap_lock_tuple() to return HeapTupleInvisible (and not just
complain directly when HeapTupleSatisfiesUpdate() returns
"HeapTupleInvisible"). It's also possible because we're using a
DirtySnapshot at various points. This is sort of like how ExecUpdate()
handles a return value of "HeapTupleSelfUpdated" from heap_update().
Not quite though, because 1. ) I prefer to throw an error (rather than
silently not UPDATE that slot), and 2. ) we're not dealing with MVCC
semantics, so the return values are different in both cases. The
*nature* of the situation handled is similar between UPSERTs (in
ExecLockUpdatedTuple()) and vanilla UPDATEs (in ExecUpdate()), though.
Does that make sense?

> * ExecLockUpdatedTuple() has (too?) many comments, but little overview
>   of what/why it is doing what it does on a higher level.

Fair point. Seems like material for a better worked out executor README.

> * plan_speculative_use_index: "Use the planner to decide speculative
>   insertion arbiter index" - Huh? " rel is the target to undergo ON
>   CONFLICT UPDATE/IGNORE." - Which rel?

Sorry, that's an obsolete comment (the function signature changed). It
should refer to the target of the Query being planned.

> * formulations as "fundamental nexus" are hard to understand imo.

I'm trying to suggest that INSERT ... ON CONFLICT UPDATE is not quite
two separate top-level commands, and yet is also not a new, distinct
type of top-level command. This is mostly a high level design decision
that maximizes code reuse.

> * Perhaps it has previously been discussed but I'm not convinced by the
>   reasoning for not looking at opclasses in infer_unique_index(). This
>   seems like it'd prohibit ever having e.g. case insensitive opclasses -
>   something surely worthwile.

I don't think anyone gave that idea the thumbs-up. However, I really
don't see the problem. Sure, we could have case insensitive opclasses
in the future, and you may want to make a unique index using one. But
then it becomes a matter of whatever unique indexes are available. The
limitation is only that you cannot explicitly indicate that you want a
certain opclass. It comes down to whatever unique indexes happen to be
available, since of course taking the alternative path is arbitrated
by a would-be unique violation. It's a bit odd that we're leaving it
up to the available indexes to decide on semantics like that, but the
problem is so narrow and the solution so involved that I'd argue it's
acceptable.

> * Doesn't infer_unique_index() have to look for indisvalid? This isn't
>   going to work well with a invalid (not to speak for a !ready) index.

It does (check IndexIsValid()). I think the mistake I made here was
not checking IndexIsReady(), since that is an additional concern above
what the similar get_relation_info() function must consider.

> * Is ->relation in the UpdateStmt generated in transformInsertStmt ever
>   used for anything? If so, it'd possibly generate some possible
>   nastyness due to repeated name lookups. Looks like it'll be used in
>   transformUpdateStmt

What, you mean security issues, for example? I have a hard time seeing
how that could work in practice, given that the one and only target
RTE is marked with the appropriate updatedCols originating from
transformUpdateStmt(). Still, it is a concern generally - better safe
than sorry. I was thinking of plugging it by ensuring that the
Relations matched, but that might not be good enough. Maybe it would
be better to bite the bullet and have transformUpdateStmt() use the
same Relation directly, which is something I hoped to avoid (better to
have transformUpdateStmt() know as little as possible about this, I'd
say).

> * What's the reason for the !pstate->p_parent? Also why the parens?
> pstate->p_is_speculative = (pstate->parentParseState &&
> (!pstate->p_parent_cte &&
> pstate->parentParseState->p_is_insert &&
> pstate->parentParseState->p_is_speculative));

You mean the "!pstate->p_parent_cte"? That's there because you can get
queries to segfault if this logic doesn't consider that a
data-modifying CTE can have an UPDATE that appears within a CTE
referenced from an INSERT.  :-)

> * Why did you need to make %nonassoc   DISTINCT and ON nonassoc in the grammar?

To prevent a shift/reduce conflict, I changed the associativity.
Without this, here are the details of State 700, which has the
conflict (from gram.output):

"""""
State 700

  1465 opt_distinct: DISTINCT .
  1466             | DISTINCT . ON '(' expr_list ')'

    ON  shift, and go to state 1094

    ON        [reduce using rule 1465 (opt_distinct)]
    $default  reduce using rule 1465 (opt_distinct)

"""""

> * The whole speculative insert logic isn't really well documented. Why,
>   for example, do we actually need the token? And why are there no
>   issues with overflow? And where is it documented that a 0 means
>   there's no token? ...

Fair enough. Presumably it's okay that overflow theoretically could
occur, because a race is all but impossible. The token represents a
particular attempt by some backend at inserting a tuple, that needs to
be waited on specifically only if it is their active attempt (and the
xact is still running). Otherwise, you get unprincipled deadlocks.
Even if by some incredibly set of circumstances it wraps around, worst
case scenario you get an unprinciped deadlock, which is hardly the end
of the world given the immense number of insertions required, and the
immense unlikelihood that things would work out that way - it'd be
basically impossible.

I'll document the "0" thing.

> * Isn't "SpecType" a awfully generic (and nondescriptive) name?

OK. That'll be changed.

> * /* XXX:  Make sure that re-use of bits is safe here */ - no, not
>   unless you change existing checks.

I think that this is a restatement of your remarks on logical decoding. No?

> * /*
> * Immediately VACUUM "super-deleted" tuples
> */
> if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
> InvalidTransactionId))
> return HEAPTUPLE_DEAD;
>   Is that branch really needed? Shouldn't it just be happening as a
>   consequence of the already existing code? Same in SatisfiesMVCC. If
>   you actually needed that block, it'd need to be done in SatisfiesSelf
>   as well, no? You have a comment about a possible loop - but that seems
>   wrong to me, implying that HEAP_XMIN_COMMITTED was set invalidly.

Indeed, this code is kind of odd. While I think the omission within
SatisfiesSelf() may be problematic too, if you really want to know why
this code is needed, uncomment it and run Jeff's stress-test. It will
reliably break.

This code:

"""""
if (HeapTupleHeaderXminInvalid(tuple))
  return HEAPTUPLE_DEAD;

"""""

and this code:

"""""
/*
 * Immediately VACUUM "super-deleted" tuples
 */
if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
    InvalidTransactionId))
    return HEAPTUPLE_DEAD;

"""""

are not equivalent (nor is the latter a superset of the former). Maybe
they should be, but they're not. What's more, heap tuple header raw
xmin has never been able to change, and I don't think there is any
reason for it to be InvalidTransactionId. See my new comments within
EvalPlanQualFetch() remarking on how it's now possible for that to
change (before, the comment claimed that it wasn't possible).

> Ok, I can't focus at all any further at this point. But there's enough
> comments here that some even might make sense ;)

Most do.  :-)

Thanks.

[1] https://wiki.postgresql.org/wiki/UPSERT#.22Cardinality_violation.22_errors_in_detail
--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: Redesigning checkpoint_segments
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Proposal : REINDEX xxx VERBOSE