Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Date
Msg-id 20200525070033.GA1591335@rfd.leadboat.com
Whole thread Raw
In response to Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Mon, Apr 06, 2020 at 09:18:47PM -0700, Noah Misch wrote:
> On Mon, Apr 06, 2020 at 02:46:09PM -0400, Tom Lane wrote:
> > Noah Misch <noah@leadboat.com> writes:
> > > On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
> > >> So I think what we're actually trying to accomplish here is to
> > >> ensure that instead of deleting up to half of the SLRU space
> > >> before the cutoff, we delete up to half-less-one-segment.
> > >> Maybe it should be half-less-two-segments, just to provide some
> > >> cushion against edge cases.  Reading the first comment in
> > >> SetTransactionIdLimit makes one not want to trust too much in
> > >> arguments based on the exact value of xidWrapLimit, while for
> > >> the other SLRUs it was already unclear whether the edge cases
> > >> were exactly right.
> > 
> > > That could be interesting insurance.  While it would be sad for us to miss an
> > > edge case and print "must be vacuumed within 2 transactions" when wrap has
> > > already happened, reaching that message implies the DBA burned ~1M XIDs, all
> > > in single-user mode.  More plausible is FreezeMultiXactId() overrunning the
> > > limit by tens of segments.  Hence, if we do buy this insurance, let's skip far
> > > more segments.  For example, instead of unlinking segments representing up to
> > > 2^31 past XIDs, we could divide that into an upper half that we unlink and a
> > > lower half.  The lower half will stay in place; eventually, XID consumption
> > > will overwrite it.  Truncation behavior won't change until the region of CLOG
> > > for pre-oldestXact XIDs exceeds 256 MiB.  Beyond that threshold,
> > > vac_truncate_clog() will unlink the upper 256 MiB and leave the rest.  CLOG
> > > maximum would rise from 512 MiB to 768 MiB.  Would that be worthwhile?

> > Temporarily wasting some disk
> > space is a lot more palatable than corrupting data, and these code
> > paths are necessarily not terribly well tested.  So +1 for more
> > insurance.
> 
> Okay, I'll give that a try.  I expect this will replace the PagePrecedes
> callback with a PageDiff callback such that PageDiff(a, b) < 0 iff
> PagePrecedes(a, b).  PageDiff callbacks shall distribute return values
> uniformly in [INT_MIN,INT_MAX].  SimpleLruTruncate() will unlink segments
> where INT_MIN/2 < PageDiff(candidate, cutoff) < 0.

While doing so, I found that slru-truncate-modulo-v2.patch did get edge cases
wrong, as you feared.  In particular, if the newest XID reached xidStopLimit
and was in the first page of a segment, TruncateCLOG() would delete its
segment.  Attached slru-truncate-modulo-v3.patch fixes that; as restitution, I
added unit tests covering that and other scenarios.  Reaching the bug via XIDs
was hard, requiring one to burn 1000k-CLOG_XACTS_PER_PAGE=967k XIDs in
single-user mode.  I expect the bug was easier to reach via pg_multixact.

The insurance patch stacks on top of the bug fix patch.  It does have a
negative effect on TruncateMultiXact(), which uses SlruScanDirCbFindEarliest
to skip truncation in corrupted clusters.  SlruScanDirCbFindEarliest() gives
nonsense answers if "future" segments exist.  That can happen today, but the
patch creates new ways to make it happen.  The symptom is wasting yet more
space in pg_multixact.  I am okay with this, since it arises only after one
fills pg_multixact 50% full.  There are alternatives.  We could weaken the
corruption defense in TruncateMultiXact() or look for another implementation
of equivalent defense.  We could unlink, say, 75% or 95% of the "past" instead
of 50% (this patch) or >99.99% (today's behavior).

Thanks,
nm

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: PostgresSQL 13.0 Beta 1 on Phoronix
Next
From: Fujii Masao
Date:
Subject: Re: pg13 docs: minor fix for "System views" list