Re: wal_consistency_checking reports an inconsistency on masterbranch - Mailing list pgsql-hackers

From Andres Freund
Subject Re: wal_consistency_checking reports an inconsistency on masterbranch
Date
Msg-id 20180501050846.kyt2jq7pbqa4agbl@alap3.anarazel.de
Whole thread Raw
In response to Re: wal_consistency_checking reports an inconsistency on masterbranch  (Andres Freund <andres@anarazel.de>)
Responses Re: wal_consistency_checking reports an inconsistency on masterbranch  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 2018-04-23 07:58:30 -0700, Andres Freund wrote:
> On 2018-04-23 13:22:21 +0300, Heikki Linnakangas wrote:
> > On 13/04/18 13:08, Michael Paquier wrote:
> > > On Fri, Apr 13, 2018 at 02:15:35PM +0530, amul sul wrote:
> > > > I have looked into this and found that the issue is in heap_xlog_delete -- we
> > > > have missed to set the correct offset number from the target_tid when
> > > > XLH_DELETE_IS_PARTITION_MOVE flag is set.
> > > 
> > > Oh, this looks good to me.  So when a row was moved across partitions
> > > this could have caused incorrect tuple references on a standby, which
> > > could have caused corruptions.
> > 
> > Hmm. So, the problem was that HeapTupleHeaderSetMovedPartitions() only sets
> > the block number to InvalidBlockNumber, and leaves the offset number
> > unchanged. WAL replay didn't preserve the offset number, so the master and
> > the standby had a different offset number in the ctid.
> 
> Right.
> 
> > Why does HeapTupleHeaderSetMovedPartitions() leave the offset number
> > unchanged? The old offset number is meaningless without the block number.
> > Also, bits and magic values in the tuple header are scarce. We're
> > squandering a whole range of values in the ctid, everything with
> > ip_blkid==InvalidBlockNumber, to mean "moved to different partition", when a
> > single value would suffice.
> 
> Yes, I agree on that.
> 
> 
> > I kept using InvalidBlockNumber there, so ItemPointerIsValid() still
> > considers those item pointers as invalid. But my gut feeling is actually
> > that it would be better to use e.g. 0 as the block number, so that these
> > item pointers would appear valid. Again, to follow the precedent of
> > speculative insertion tokens. But I'm not sure if there was some
> > well-thought-out reason to make them appear invalid. A comment on that would
> > be nice, at least.
> 
> That seems risky to me. We want something that stops EPQ style chasing
> without running into asserts for invalid offsets...

Heikki, would you rather apply this yourself or have me do it?

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: [HACKERS] Clock with Adaptive Replacement
Next
From: Thomas Munro
Date:
Subject: A few warnings on Windows