Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key
Date
Msg-id 20180405014439.fbezvbjrmcw64vjc@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key
Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key
Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key
List pgsql-hackers
Hi,

On 2018-04-02 11:26:38 -0400, Robert Haas wrote:
> On Wed, Mar 28, 2018 at 2:12 PM, Andres Freund <andres@anarazel.de> wrote:
> > How will it break it? They'll see an invalid ctid and conclude that the
> > tuple is dead? Without any changes that's already something that can
> > happen if a later tuple in the chain has been pruned away.  Sure, that
> > code won't realize it should error out because the tuple is now in a
> > different partition, but neither would a infomask bit.
> >
> > I think my big problem is that I just don't see what the worst that can
> > happen is. We'd potentially see a broken ctid chain, something that very
> > commonly happens, and consider the tuple to be invisible.  That seems
> > pretty sane behaviour for unadapted code, and not any worse than other
> > potential solutions.
>
> This is more or less my feeling as well.  I think it's better to
> conserve our limited supply of infomask bits as much as we can, and I
> do think that we should try to reclaimed HEAP_MOVED_IN and
> HEAP_MOVED_OFF in the future instead of defining the combination of
> the two of them to mean something now.

Yep.

It'd also make locking more complicated or require to keep more
information around in HeapUpdateFailureData. In a number of places we
currently release the buffer pin before switching over heap_lock_tuple
etc results, or there's not even a way to get at the infomask currently
(heap_update failing).


> Modulo implementation quality, I think the risk
> level of this patch is somewhat but not vastly higher than
> 37484ad2aacef5ec794f4dd3d5cf814475180a78, which similarly defined a
> previously-unused bit pattern in the tuple header.

Personally I think that change was vastly riskier, because it affected
freezing and wraparounds. Which is something we've repeatedly gotten
wrong.


> The reason I think this one might be somewhat riskier is because
> AFAICS it's not so easy to make sure we've found all the code, even in
> core, that might care, as it was in that case; and also because
> updates happen more than freezing.

Butthe consequences of not catching a changed piece of code are fairly
harmless. And I'd say things that happen more often are actually easier
to validate than something that with default settings requires hours of
testing...

I've attached a noticeably editorialized patch:

- I'm uncomfortable with the "moved" information not being crash-safe /
  replicated. Thus I added a new flag to preserve it, and removed the
  masking of the moved bit in the ctid from heap_mask().

- renamed macros to not mention valid / invalid block numbers, but
  rather
  HeapTupleHeaderSetMovedPartitions / HeapTupleHeaderIndicatesMovedPartitions
  and
  ItemPointerSetMovedPartitions /  ItemPointerIndicatesMovedPartitions

  I'm not wedded to these names, but I'l be adamant they they're not
  talking about invalid block numbers. Makes code harder to understand
  imo.

- removed new assertion from heap_get_latest_tid(), it's wrong for the
  case where all row versions are invisible.

- editorialized comments a bit

- added a few more assertions


I went through the existing code to make sure that
a) no checks where missed
b) to evaluate what the consequences when chasing chains would be
c) to evaluate what the consequences when we miss erroring out

WRT b), it's usually just superflous extra work if the new checks
weren't there. I went through all callers accessing xmax (via GetRawXmax
and GetUpdateXid):

b)
- heap rewrites will keep a tuple in hashtable till end of run, then
  reset the ctid to self. No real corruption, but we'd not detect
  further errors when attempting to follow chain.
- EvalPlanQualFetch would fail to abort loop, attempt to fetch
  tuple. This'll extend the relation by a single page, because P_NEW ==
  InvalidBlockNumber.
- heap_prune_chain - no changes needed (delete isn't recursed through)
- heap_get_root_tuples - same
- heap_hot_search_buffer - only continues over hot updates
- heap_lock_tuple (and subsidiary routines) - same as EvalPlanQualFetch,
  would then return HeapTupleUpdated.

c)
- GetTupleForTrigger - the proper error wouldn't be raised, instead a
  NULL tuple would be passed to the trigger
- EvalPlanQualFetch - a NULL tuple would be returned after the
  consequences above
- RelationFindReplTupleBy* - wrong error message
- ExecLockRows - no error would be raised, continue normally
- ExecDelete() - tuple ignored without error
- ExecUpdate() - same


Questions:

- I'm not perfectly happy with
  "tuple to be locked was already moved to another partition due to concurrent update"
  as the error message. If somebody has a better suggestions.
- should heap_get_latest_tid() error out when the chain ends in a moved
  tuple?  I personally think it doesn't matter much, the functionality
  is so bonkers and underspecified that it doesn't matter anyway ;)
- I'm not that happy with the number of added spec test files with
  number postfixes. Can't we combine them into a single file?
- as remarked elsewhere on this thread, I think the used errcode should
  be a serialization failure

Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Rewrite of pg_dump TAP tests
Next
From: David Rowley
Date:
Subject: Re: [HACKERS] Runtime Partition Pruning