Thread: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Noah Misch

Date:

02 February 2019, 08:38:22

While testing an xidStopLimit corner case, I got this:

3656710 2019-01-05 00:05:13.910 GMT LOG:  automatic aggressive vacuum to prevent wraparound of table
"test.pg_toast.pg_toast_826":index scans: 0
 
3656710 2019-01-05 00:05:16.912 GMT LOG:  could not truncate directory "pg_xact": apparent wraparound
3656710 2019-01-05 00:05:16.912 GMT DEBUG:  transaction ID wrap limit is 4294486400, limited by database with OID 1
3656710 2019-01-05 00:05:16.912 GMT WARNING:  database "template1" must be vacuumed within 481499 transactions
3656710 2019-01-05 00:05:16.912 GMT HINT:  To avoid a database shutdown, execute a database-wide VACUUM in that
database.

I think the WARNING was correct about having 481499 XIDs left before
xidWrapLimit, and the spurious "apparent wraparound" arose from this
rounding-down in SimpleLruTruncate():

    cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
...
    /*
     * While we are holding the lock, make an important safety check: the
     * planned cutoff point must be <= the current endpoint page. Otherwise we
     * have already wrapped around, and proceeding with the truncation would
     * risk removing the current segment.
     */
    if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
    {
        LWLockRelease(shared->ControlLock);
        ereport(LOG,
                (errmsg("could not truncate directory \"%s\": apparent wraparound",
                        ctl->Dir)));

We round "cutoffPage" to make ctl->PagePrecedes(segpage, cutoffPage) return
false for the segment containing the cutoff page.  CLOGPagePrecedes() (and
most SLRU PagePrecedes methods) implements a circular address space.  Hence,
the rounding also causes ctl->PagePrecedes(segpage, cutoffPage) to return true
for the segment furthest in the future relative to the unrounded cutoffPage
(if it exists).  That's bad.  Such a segment rarely exists, because
xidStopLimit protects 1000000 XIDs, and the rounding moves truncation by no
more than (BLCKSZ * CLOG_XACTS_PER_BYTE * SLRU_PAGES_PER_SEGMENT - 1) =
1048575 XIDs.  Thus, I expect to see this problem at 4.9% of xidStopLimit
values.  I expect this is easier to see with multiStopLimit, which protects
only 100 mxid.

The main consequence is the false alarm.  A prudent DBA will want to react to
true wraparound, but no such wraparound has occurred.  Also, we temporarily
waste disk space in pg_xact.  This feels like a recipe for future bugs.  The
fix I have in mind, attached, is to change instances of
ctl->PagePrecedes(FIRST_PAGE_OF_SEGMENT, ROUNDED_cutoffPage) to
ctl->PagePrecedes(LAST_PAGE_OF_SEGMENT, cutoffPage).  I'm inclined not to
back-patch this; does anyone favor back-patching?

Thanks,
nm

Attachment

slru-truncate-modulo-v1.patch

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Noah Misch

Date:

11 February 2019, 05:13:49

On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
> The main consequence is the false alarm.  A prudent DBA will want to react to
> true wraparound, but no such wraparound has occurred.  Also, we temporarily
> waste disk space in pg_xact.  This feels like a recipe for future bugs.  The
> fix I have in mind, attached, is to change instances of
> ctl->PagePrecedes(FIRST_PAGE_OF_SEGMENT, ROUNDED_cutoffPage) to
> ctl->PagePrecedes(LAST_PAGE_OF_SEGMENT, cutoffPage).  I'm inclined not to
> back-patch this; does anyone favor back-patching?

To avoid wasting more of anyone's time: that patch is bad; I'll update this
thread when I have something better.

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Noah Misch

Date:

14 February 2019, 07:26:23

On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
> The main consequence is the false alarm.

That conclusion was incorrect.  On further study, I was able to reproduce data
loss via either of two weaknesses in the "apparent wraparound" test:

1. The result of the test is valid only until we release the SLRU ControlLock,
   which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
   segments for deletion.  Once we release that lock, latest_page_number can
   advance.  This creates a TOCTOU race condition, allowing excess deletion:

   [local] test=# table trunc_clog_concurrency ;
   ERROR:  could not access status of transaction 2149484247
   DETAIL:  Could not open file "pg_xact/0801": No such file or directory.

2. By the time the "apparent wraparound" test fires, we've already WAL-logged
   the truncation.  clog_redo() suppresses the "apparent wraparound" test,
   then deletes too much.  Startup then fails:

   881997 2019-02-10 02:53:32.105 GMT FATAL:  could not access status of transaction 708112327
   881997 2019-02-10 02:53:32.105 GMT DETAIL:  Could not open file "pg_xact/02A3": No such file or directory.
   881855 2019-02-10 02:53:32.107 GMT LOG:  startup process (PID 881997) exited with exit code 1


Fixes are available:

a. Fix the rounding in SimpleLruTruncate().  (The patch I posted upthread is
   wrong; I will correct it in a separate message.)

b. Arrange so only one backend runs vac_truncate_clog() at a time.  Other than
   AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
   checkpoint, or in the startup process.  Hence, also arrange for only one
   backend to call SimpleLruTruncate(AsyncCtl) at a time.

c. Test "apparent wraparound" before writing WAL, and don't WAL-log
   truncations that "apparent wraparound" forces us to skip.

d. Hold the ControlLock for the entirety of SimpleLruTruncate().  This removes
   the TOCTOU race condition, but TransactionIdDidCommit() and other key
   operations would be waiting behind filesystem I/O.

e. Have the SLRU track a "low cutoff" for an ongoing truncation.  Initially,
   the low cutoff is the page furthest in the past relative to cutoffPage (the
   "high cutoff").  If SimpleLruZeroPage() wishes to use a page in the
   truncation range, it would acquire an LWLock and increment the low cutoff.
   Before unlinking any segment, SlruScanDirCbDeleteCutoff() would take the
   same LWLock and recheck the segment against the latest low cutoff.

With both (a) and (b), the only way I'd know to reach the "apparent
wraparound" message is to restart in single-user mode and burn XIDs to the
point of bona fide wraparound.  Hence, I propose to back-patch (a) and (b),
and I propose (c) for HEAD only.  I don't want (d), which threatens
performance too much.  I would rather not have (e), because I expect it's more
complex than (b) and fixes strictly less than (b) fixes.

Can you see a way to improve on that plan?  Can you see other bugs of this
nature that this plan does not fix?

Thanks,
nm

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Noah Misch

Date:

17 February 2019, 04:09:13

On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:
> On further study, I was able to reproduce data loss

> Fixes are available:
> 
> a. Fix the rounding in SimpleLruTruncate().  (The patch I posted upthread is
>    wrong; I will correct it in a separate message.)

Here's a corrected version.  I now delete a segment only if both its first
page and its last page are considered to precede the cutoff; see the new
comment at SlruMayDeleteSegment().

Attachment

slru-truncate-modulo-v2.patch

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Kyotaro Horiguchi

Date:

24 July 2019, 08:27:18

Sorry in advance for link-breaking message forced by gmail..

https://www.postgresql.org/message-id/flat/20190202083822.GC32531@gust.leadboat.com

> 1. The result of the test is valid only until we release the SLRU ControlLock,
>    which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
>    segments for deletion.  Once we release that lock, latest_page_number can
>    advance.  This creates a TOCTOU race condition, allowing excess deletion:
>
>
>    [local] test=# table trunc_clog_concurrency ;
>    ERROR:  could not access status of transaction 2149484247
>    DETAIL:  Could not open file "pg_xact/0801": No such file or directory.

It seems like some other vacuum process saw larger cutoff page? If I'm
not missing something, the missing page is no longer the
"recently-populated" page at the time (As I understand it as the last
page that holds valid data). Couldn't we just ignore ENOENT there?

> 2. By the time the "apparent wraparound" test fires, we've already WAL-logged
>    the truncation.  clog_redo() suppresses the "apparent wraparound" test,
>    then deletes too much.  Startup then fails:

I agree that if truncation is skipped after issuing log, it will
lead to data-loss at the next recovery. But the follwoing log..:

>    881997 2019-02-10 02:53:32.105 GMT FATAL:  could not access status of transaction 708112327
>    881997 2019-02-10 02:53:32.105 GMT DETAIL:  Could not open file "pg_xact/02A3": No such file or directory.
>    881855 2019-02-10 02:53:32.107 GMT LOG:  startup process (PID 881997) exited with exit code 1

If it came from the same reason as 1, the log is simply ignorable, so
recovery stopping by the error is unacceptable, but the ENOENT is just
ignorable for the same reason.

As the result, I agree to (a) (fix rounding), and (c) (test
wrap-around before writing WAL) but I'm not sure for others. And
additional fix for ignorable ENOENT is needed.

What do you think about this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Noah Misch

Date:

25 July 2019, 03:45:48

On Wed, Jul 24, 2019 at 05:27:18PM +0900, Kyotaro Horiguchi wrote:
> Sorry in advance for link-breaking message forced by gmail..

Using the archives page "Resend email" link avoids that.

> https://www.postgresql.org/message-id/flat/20190202083822.GC32531@gust.leadboat.com
> 
> > 1. The result of the test is valid only until we release the SLRU ControlLock,
> >    which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
> >    segments for deletion.  Once we release that lock, latest_page_number can
> >    advance.  This creates a TOCTOU race condition, allowing excess deletion:
> >
> >
> >    [local] test=# table trunc_clog_concurrency ;
> >    ERROR:  could not access status of transaction 2149484247
> >    DETAIL:  Could not open file "pg_xact/0801": No such file or directory.
> 
> It seems like some other vacuum process saw larger cutoff page?

No, just one VACUUM suffices.

> If I'm
> not missing something, the missing page is no longer the
> "recently-populated" page at the time (As I understand it as the last
> page that holds valid data). Couldn't we just ignore ENOENT there?

The server reported this error while attempting to read CLOG to determine
whether a tuple's xmin committed or aborted.  That ENOENT means the relevant
CLOG page is not available.  To ignore that ENOENT, the server would need to
guess whether to consider the xmin committed or consider it aborted.  So, no,
we can't just ignore the ENOENT.

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Tomas Vondra

Date:

05 January 2020, 00:33:55

On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:
>On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
>> The main consequence is the false alarm.
>
>That conclusion was incorrect.  On further study, I was able to reproduce data
>loss via either of two weaknesses in the "apparent wraparound" test:
>
>1. The result of the test is valid only until we release the SLRU ControlLock,
>   which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
>   segments for deletion.  Once we release that lock, latest_page_number can
>   advance.  This creates a TOCTOU race condition, allowing excess deletion:
>
>   [local] test=# table trunc_clog_concurrency ;
>   ERROR:  could not access status of transaction 2149484247
>   DETAIL:  Could not open file "pg_xact/0801": No such file or directory.
>
>2. By the time the "apparent wraparound" test fires, we've already WAL-logged
>   the truncation.  clog_redo() suppresses the "apparent wraparound" test,
>   then deletes too much.  Startup then fails:
>
>   881997 2019-02-10 02:53:32.105 GMT FATAL:  could not access status of transaction 708112327
>   881997 2019-02-10 02:53:32.105 GMT DETAIL:  Could not open file "pg_xact/02A3": No such file or directory.
>   881855 2019-02-10 02:53:32.107 GMT LOG:  startup process (PID 881997) exited with exit code 1
>
>
>Fixes are available:
>
>a. Fix the rounding in SimpleLruTruncate().  (The patch I posted upthread is
>   wrong; I will correct it in a separate message.)
>
>b. Arrange so only one backend runs vac_truncate_clog() at a time.  Other than
>   AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
>   checkpoint, or in the startup process.  Hence, also arrange for only one
>   backend to call SimpleLruTruncate(AsyncCtl) at a time.
>
>c. Test "apparent wraparound" before writing WAL, and don't WAL-log
>   truncations that "apparent wraparound" forces us to skip.
>
>d. Hold the ControlLock for the entirety of SimpleLruTruncate().  This removes
>   the TOCTOU race condition, but TransactionIdDidCommit() and other key
>   operations would be waiting behind filesystem I/O.
>
>e. Have the SLRU track a "low cutoff" for an ongoing truncation.  Initially,
>   the low cutoff is the page furthest in the past relative to cutoffPage (the
>   "high cutoff").  If SimpleLruZeroPage() wishes to use a page in the
>   truncation range, it would acquire an LWLock and increment the low cutoff.
>   Before unlinking any segment, SlruScanDirCbDeleteCutoff() would take the
>   same LWLock and recheck the segment against the latest low cutoff.
>
>With both (a) and (b), the only way I'd know to reach the "apparent
>wraparound" message is to restart in single-user mode and burn XIDs to the
>point of bona fide wraparound.  Hence, I propose to back-patch (a) and (b),
>and I propose (c) for HEAD only.  I don't want (d), which threatens
>performance too much.  I would rather not have (e), because I expect it's more
>complex than (b) and fixes strictly less than (b) fixes.
>
>Can you see a way to improve on that plan?  Can you see other bugs of this
>nature that this plan does not fix?
>

Seems reasonable, although I wonder how much more expensive would just
doing (d) be. It seems by far the least complex solution, and it moves
"just" the SlruScanDirectory() call before the lock. It's true it adds
I/O requests, OTOH it's just unlink() without fsync() and I'd expect the
number of files to be relatively low. Plus we already do SimpleLruWaitIO
and SlruInternalWritePage in the loop.

BTW isn't that an issue that SlruInternalDeleteSegment does not do any
fsync calls after unlinking the segments? If the system crashes/reboots
before this becomes persistent (i.e. some of the segments reappear,
won't that cause a problem)?


It's a bit unfortunate that a patch for a data corruption / loss issue
(even if a low-probability one) fell through multiple commitfests.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Noah Misch

Date:

05 January 2020, 01:19:16

On Sun, Jan 05, 2020 at 01:33:55AM +0100, Tomas Vondra wrote:
> On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:
> >a. Fix the rounding in SimpleLruTruncate().  (The patch I posted upthread is
> >  wrong; I will correct it in a separate message.)
> >
> >b. Arrange so only one backend runs vac_truncate_clog() at a time.  Other than
> >  AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
> >  checkpoint, or in the startup process.  Hence, also arrange for only one
> >  backend to call SimpleLruTruncate(AsyncCtl) at a time.
> >
> >c. Test "apparent wraparound" before writing WAL, and don't WAL-log
> >  truncations that "apparent wraparound" forces us to skip.
> >
> >d. Hold the ControlLock for the entirety of SimpleLruTruncate().  This removes
> >  the TOCTOU race condition, but TransactionIdDidCommit() and other key
> >  operations would be waiting behind filesystem I/O.

> >With both (a) and (b), the only way I'd know to reach the "apparent
> >wraparound" message is to restart in single-user mode and burn XIDs to the
> >point of bona fide wraparound.  Hence, I propose to back-patch (a) and (b),
> >and I propose (c) for HEAD only.  I don't want (d), which threatens
> >performance too much.  I would rather not have (e), because I expect it's more
> >complex than (b) and fixes strictly less than (b) fixes.

> Seems reasonable, although I wonder how much more expensive would just
> doing (d) be. It seems by far the least complex solution, and it moves
> "just" the SlruScanDirectory() call before the lock. It's true it adds
> I/O requests, OTOH it's just unlink() without fsync() and I'd expect the
> number of files to be relatively low. Plus we already do SimpleLruWaitIO
> and SlruInternalWritePage in the loop.

Trivial read-only transactions often need CLogControlLock to check tuple
visibility.  If an unlink() takes 1s, stalling read-only transactions for that
1s is a big problem.  SimpleLruWaitIO() and SlruInternalWritePage() release
the control lock during I/O, then re-acquire it.  (Moreover,
SimpleLruTruncate() rarely calls them.  Calling them implies a page old enough
to truncate was modified after the most recent checkpoint.)

> BTW isn't that an issue that SlruInternalDeleteSegment does not do any
> fsync calls after unlinking the segments? If the system crashes/reboots
> before this becomes persistent (i.e. some of the segments reappear,
> won't that cause a problem)?

I think not; we could turn SlruInternalDeleteSegment() into a no-op, and the
only SQL-visible consequence would be extra disk usage.  CheckPoint fields
tell the server what region of slru data is meaningful, and segments outside
that range merely waste disk space.  (If that's not true and can't be made
true, we'd also need to stop ignoring the unlink() return value.)

> It's a bit unfortunate that a patch for a data corruption / loss issue
> (even if a low-probability one) fell through multiple commitfests.

Thanks for investing in steps to fix that.

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Tom Lane

Date:

19 March 2020, 22:04:52

Noah Misch <noah@leadboat.com> writes:
> On Sun, Jan 05, 2020 at 01:33:55AM +0100, Tomas Vondra wrote:
>> It's a bit unfortunate that a patch for a data corruption / loss issue
>> (even if a low-probability one) fell through multiple commitfests.

> Thanks for investing in steps to fix that.

Yeah, this patch has been waiting in the queue for way too long :-(.

I spent some time studying this, and I have a few comments/questions:

1. It seems like this discussion is conflating two different issues.
The original complaint was about a seemingly-bogus log message "could
not truncate directory "pg_xact": apparent wraparound".  Your theory
about that, IIUC, is that SimpleLruTruncate's initial round-back of
cutoffPage to a segment boundary moved us from a state where
shared->latest_page_number doesn't logically precede the cutoffPage to
one where it does, triggering the message.  So removing the initial
round-back, and then doing whatever's needful to compensate for that in
the later processing, is a reasonable fix to prevent the bogus warning.
However, you're also discussing whether or not an SLRU segment file that
is close to the wraparound boundary should get removed or not.  As far
as I can see that's 100% independent of issuance of the log message, no?
This might not affect the code substance of the patch at all; but it
seems like we need to be clear about it in our discussion, and maybe the
comments need to change too.

2. I wonder whether we have an issue even with rounding back to the
SLRU page boundary, as is done by each caller before we ever get to
SimpleLruTruncate.  I'm pretty sure that the actual anti-wraparound
protections are all precise to the XID level, so that there's some
daylight between what SimpleLruTruncate is told and the range of
data that the higher-level code thinks is of interest.  Maybe we
should restructure things so that we keep the original cutoff XID
(or equivalent) all the way through, and compare the start/end
positions of a segment file directly to that.

3. It feels like the proposed test of cutoff position against both
ends of a segment is a roundabout way of fixing the problem.  I
wonder whether we ought not pass *both* the cutoff and the current
endpoint (latest_page_number) down to the truncation logic, and
have it compare against both of those values.

To try to clarify this in my head, I thought about an image of
the modular XID space as an octagon, where each side would correspond to
a segment file if we chose numbers such that there were only 8 possible
segment files.  Let's say that nextXID is somewhere in the bottommost
octagon segment.  The oldest possible value for the truncation cutoff
XID is a bit less than "halfway around" from nextXID; so it could be
in the topmost octagon segment, if the minimum permitted daylight-
till-wraparound is less than the SLRU segment size (which it is).
Then, if we round the cutoff XID "back" to a segment boundary, most
of the current (bottommost) segment is now less than halfway around
from that cutoff point, and in particular the current segment's
starting page is exactly halfway around.  Because of the way that
TransactionIdPrecedes works, the current segment will be considered to
precede that cutoff point (the int32 difference comes out as exactly
2^31 which is negative).  Boom, data loss, because we'll decide the
current segment is removable.

I think that your proposed patch does fix this, but I'm not quite
sold that the corner cases (where the cutoff XID is itself exactly
at a page boundary) are right.  In any case, I think it'd be more
robust to be comparing explicitly against a notion of the latest
in-use page number, instead of backing into it from an assumption
that the cutoff XID itself is less than halfway around.

I wonder if we ought to dodge the problem by having a higher minimum
value of the required daylight-before-wraparound, so that the cutoff
point couldn't be in the diametrically-opposite-current segment but
would have to be at least one segment before that.  In the end,
I believe that all of this logic was written under an assumption
that we should never get into a situation where we are so close
to the wraparound threshold that considerations like these would
manifest.  Maybe we can get it right, but I don't have huge
faith in it.

It also bothers me that some of the callers of SimpleLruTruncate
have explicit one-count backoffs of the cutoff point and others
do not.  There's no obvious reason for the difference, so I wonder
if that isn't something we should have across-the-board, or else
adjust SimpleLruTruncate to do the equivalent thing internally.

I haven't thought much yet about your second point about race
conditions arising from nextXID possibly moving before we
finish the deletion scan.  Maybe we could integrate a fix for
that issue, along the lines of (1) see an SLRU segment file,
(2) determine that it appears to precede the cutoff XID, if so
(3) acquire the control lock and fetch latest_page_number,
compare against that to verify that the segment file is old
and not new, then (4) unlink if that still holds.

            regards, tom lane

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Noah Misch

Date:

22 March 2020, 08:34:52

On Thu, Mar 19, 2020 at 06:04:52PM -0400, Tom Lane wrote:
> Yeah, this patch has been waiting in the queue for way too long :-(.

Thanks for reviewing.

> I spent some time studying this, and I have a few comments/questions:
> 
> 1. It seems like this discussion is conflating two different issues.
> The original complaint was about a seemingly-bogus log message "could
> not truncate directory "pg_xact": apparent wraparound".  Your theory
> about that, IIUC, is that SimpleLruTruncate's initial round-back of
> cutoffPage to a segment boundary moved us from a state where
> shared->latest_page_number doesn't logically precede the cutoffPage to
> one where it does, triggering the message.  So removing the initial
> round-back, and then doing whatever's needful to compensate for that in
> the later processing, is a reasonable fix to prevent the bogus warning.
> However, you're also discussing whether or not an SLRU segment file that
> is close to the wraparound boundary should get removed or not.  As far

When the newest XID and the oldest XID fall in "opposite" segments in the XID
space, we must not unlink the segment containing the newest XID.  That is the
chief goal at present.

> as I can see that's 100% independent of issuance of the log message, no?

Perhaps confusing is that the first message of the thread and the subject line
contain wrong claims, which I corrected in the 2019-02-13 message[1].  Due to
point (2) in [1], it's essential to make the "apparent wraparound" message a
can't-happen event.  Hence, I'm not looking to improve the message or its
firing conditions.  I want to fix the excess segment deletion, at which point
the message will become unreachable except under single-user mode.

> 2. I wonder whether we have an issue even with rounding back to the
> SLRU page boundary, as is done by each caller before we ever get to
> SimpleLruTruncate.  I'm pretty sure that the actual anti-wraparound
> protections are all precise to the XID level, so that there's some
> daylight between what SimpleLruTruncate is told and the range of
> data that the higher-level code thinks is of interest.  Maybe we
> should restructure things so that we keep the original cutoff XID
> (or equivalent) all the way through, and compare the start/end
> positions of a segment file directly to that.

Currently, slru.c knows nothing about the division of pages into records.
Hmm.  To keep oldestXact all the way through, I suppose the PagePrecedes
callback could become one or more record-oriented (e.g. XID-oriented)
callbacks.  The current scheme just relies on TransactionIdToPage() in
TruncateCLOG().  If TransactionIdToPage() had a bug, all sorts of CLOG lookups
would do wrong things.  Hence, I think today's scheme is tougher to get wrong.
Do you see it differently?

> 3. It feels like the proposed test of cutoff position against both
> ends of a segment is a roundabout way of fixing the problem.  I
> wonder whether we ought not pass *both* the cutoff and the current
> endpoint (latest_page_number) down to the truncation logic, and
> have it compare against both of those values.

Since latest_page_number can keep changing throughout SlruScanDirectory()
execution, that would give a false impression of control.  Better to
demonstrate that the xidWrapLimit machinery keeps latest_page_number within
acceptable constraints than to ascribe significance to a comparison with a
stale latest_page_number.

> To try to clarify this in my head, I thought about an image of
> the modular XID space as an octagon, where each side would correspond to
> a segment file if we chose numbers such that there were only 8 possible
> segment files.  Let's say that nextXID is somewhere in the bottommost
> octagon segment.  The oldest possible value for the truncation cutoff
> XID is a bit less than "halfway around" from nextXID; so it could be
> in the topmost octagon segment, if the minimum permitted daylight-
> till-wraparound is less than the SLRU segment size (which it is).
> Then, if we round the cutoff XID "back" to a segment boundary, most
> of the current (bottommost) segment is now less than halfway around
> from that cutoff point, and in particular the current segment's
> starting page is exactly halfway around.  Because of the way that
> TransactionIdPrecedes works, the current segment will be considered to
> precede that cutoff point (the int32 difference comes out as exactly
> 2^31 which is negative).  Boom, data loss, because we'll decide the
> current segment is removable.

Exactly.
https://docs.google.com/drawings/d/1xRTbQ4DVyP5wI1Ujm_gmmY-cC8KKCjahEtsU_o0fC7I
uses your octagon to show the behaviors before and after this patch.

> I think that your proposed patch does fix this, but I'm not quite
> sold that the corner cases (where the cutoff XID is itself exactly
> at a page boundary) are right.

That's a good thing to worry about.  More specifically, I think the edge case
to check is when oldestXact is the last XID of a _segment_.  That case
maximizes the XIDs we can delete.  At that time, xidWrapLimit should likewise
fall near the end of some opposing segment that we refuse to unlink.

> It also bothers me that some of the callers of SimpleLruTruncate
> have explicit one-count backoffs of the cutoff point and others
> do not.  There's no obvious reason for the difference, so I wonder
> if that isn't something we should have across-the-board, or else
> adjust SimpleLruTruncate to do the equivalent thing internally.

Consider the case of PerformOffsetsTruncation().  If newOldestMulti is the
first of a page, then SimpleLruTruncate() gets the previous page.  If that
page and the newOldestMulti page fall in different segments, could we unlink
the segment that contained newOldestMulti, or does some other off-by-one
compensate?  I'm not sure.  I do know that to lose mxact data this way, one
must reach multiStopLimit, restart in single user mode, and consume all the
way to the edge of multiWrapLimit.  Considering the rarity of those
preconditions, I am not inclined to bundle a fix with $SUBJECT.  (Even OID
reuse race conditions may present more risk than this does.)  If someone does
pursue a fix here, I recommend looking at other fixes before making the other
callers subtract one.  The subtraction in TruncateSUBTRANS() and
PerformOffsetsTruncation() is a hack.

[1] https://postgr.es/m/20190214072623.GA1139206@rfd.leadboat.com

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Tom Lane

Date:

25 March 2020, 20:42:31

Noah Misch <noah@leadboat.com> writes:
> On Thu, Mar 19, 2020 at 06:04:52PM -0400, Tom Lane wrote:
>> 1. It seems like this discussion is conflating two different issues.

> When the newest XID and the oldest XID fall in "opposite" segments in the XID
> space, we must not unlink the segment containing the newest XID.  That is the
> chief goal at present.

Got it.  Thanks for clarifying the scope of the patch.

>> 3. It feels like the proposed test of cutoff position against both
>> ends of a segment is a roundabout way of fixing the problem.  I
>> wonder whether we ought not pass *both* the cutoff and the current
>> endpoint (latest_page_number) down to the truncation logic, and
>> have it compare against both of those values.

> Since latest_page_number can keep changing throughout SlruScanDirectory()
> execution, that would give a false impression of control.  Better to
> demonstrate that the xidWrapLimit machinery keeps latest_page_number within
> acceptable constraints than to ascribe significance to a comparison with a
> stale latest_page_number.

Perhaps.  I'm prepared to accept that line of argument so far as the clog
SLRU goes, but I'm not convinced that the other SLRUs have equally robust
defenses against advancing too far.  So on the whole I'd rather that the
SLRU logic handled this issue strictly on the basis of what it knows,
without assumptions about what calling code may be doing.  Still, maybe
we only really care about the risk for the clog SLRU?

>> To try to clarify this in my head, I thought about an image of
>> the modular XID space as an octagon, where each side would correspond to
>> a segment file if we chose numbers such that there were only 8 possible
>> segment files.

> Exactly.
> https://docs.google.com/drawings/d/1xRTbQ4DVyP5wI1Ujm_gmmY-cC8KKCjahEtsU_o0fC7I
> uses your octagon to show the behaviors before and after this patch.

Cool, thanks for drafting that up.  (My original sketch was not of
publishable quality ;-).)  To clarify, the upper annotations probably
ought to read "nextXid <= xidWrapLimit"?  And "cutoffPage" ought
to be affixed to the orange dot at lower right of the center image?

I agree that this diagram depicts why we have a problem right now,
and the right-hand image shows what we want to have happen.
What's a little less clear is whether the proposed patch achieves
that effect.

In particular, after studying this awhile, it seems like removal
of the initial "cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT"
adjustment isn't really affecting anything.  It's already the case
that just by allowing oldestXact to get rounded back to an SLRU page
boundary, we've created some daylight between oldestXact and the
cutoff point.  Rounding back further within the same SLRU segment
changes nothing.  (For example, suppose that oldestXact is already
within the oldest page of its SLRU segment.  Then either rounding
rule has the same effect.  But there's still a little bit of room for
xidWrapLimit to be in the opposite SLRU segment, allowing trouble.)

So I think what we're actually trying to accomplish here is to
ensure that instead of deleting up to half of the SLRU space
before the cutoff, we delete up to half-less-one-segment.
Maybe it should be half-less-two-segments, just to provide some
cushion against edge cases.  Reading the first comment in
SetTransactionIdLimit makes one not want to trust too much in
arguments based on the exact value of xidWrapLimit, while for
the other SLRUs it was already unclear whether the edge cases
were exactly right.

In any case, it feels like the specific solution you have here,
of testing both ends of the segment, is a roundabout way of
providing that one-segment slop; and it doesn't help if we decide
we need two-segment slop.  Can we write the test in a way that
explicitly provides N segments of slop?

            regards, tom lane

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Noah Misch

Date:

30 March 2020, 05:28:09

On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Thu, Mar 19, 2020 at 06:04:52PM -0400, Tom Lane wrote:
> >> 3. It feels like the proposed test of cutoff position against both
> >> ends of a segment is a roundabout way of fixing the problem.  I
> >> wonder whether we ought not pass *both* the cutoff and the current
> >> endpoint (latest_page_number) down to the truncation logic, and
> >> have it compare against both of those values.
> 
> > Since latest_page_number can keep changing throughout SlruScanDirectory()
> > execution, that would give a false impression of control.  Better to
> > demonstrate that the xidWrapLimit machinery keeps latest_page_number within
> > acceptable constraints than to ascribe significance to a comparison with a
> > stale latest_page_number.
> 
> Perhaps.  I'm prepared to accept that line of argument so far as the clog
> SLRU goes, but I'm not convinced that the other SLRUs have equally robust
> defenses against advancing too far.  So on the whole I'd rather that the
> SLRU logic handled this issue strictly on the basis of what it knows,
> without assumptions about what calling code may be doing.  Still, maybe
> we only really care about the risk for the clog SLRU?

PerformOffsetsTruncation() is the most at-risk, since a single VACUUM could
burn millions of multixacts via FreezeMultiXactId() calls.  (To make that
happen in single-user mode, I suspect one could use prepared transactions as
active lockers and/or in-progress updaters.)  I'm not concerned about other
SLRUs.  TruncateCommitTs() moves in lockstep with TruncateCLOG().  The other
SimpleLruTruncate() callers handle data that becomes obsolete at every
postmaster restart.

> > Exactly.
> > https://docs.google.com/drawings/d/1xRTbQ4DVyP5wI1Ujm_gmmY-cC8KKCjahEtsU_o0fC7I
> > uses your octagon to show the behaviors before and after this patch.
> 
> Cool, thanks for drafting that up.  (My original sketch was not of
> publishable quality ;-).)  To clarify, the upper annotations probably
> ought to read "nextXid <= xidWrapLimit"?

It diagrams the scenario of nextXid reaching xidWrapLimit, so the green dot
represents both values.

> And "cutoffPage" ought
> to be affixed to the orange dot at lower right of the center image?

No; oldestXact and cutoffPage have the same position in that diagram, because
the patch causes the cutoffPage variable to denote the page that contains
oldestXact.  I've now added an orange dot to show that.

> I agree that this diagram depicts why we have a problem right now,
> and the right-hand image shows what we want to have happen.
> What's a little less clear is whether the proposed patch achieves
> that effect.
> 
> In particular, after studying this awhile, it seems like removal
> of the initial "cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT"
> adjustment isn't really affecting anything.

True.  The set of unlink() calls needs to be the same for oldestXact in the
first page of a segment, in the last page, or in some interior page.  Removing
the rounding neither helps nor hurts correctness.

> So I think what we're actually trying to accomplish here is to
> ensure that instead of deleting up to half of the SLRU space
> before the cutoff, we delete up to half-less-one-segment.
> Maybe it should be half-less-two-segments, just to provide some
> cushion against edge cases.  Reading the first comment in
> SetTransactionIdLimit makes one not want to trust too much in
> arguments based on the exact value of xidWrapLimit, while for
> the other SLRUs it was already unclear whether the edge cases
> were exactly right.

That could be interesting insurance.  While it would be sad for us to miss an
edge case and print "must be vacuumed within 2 transactions" when wrap has
already happened, reaching that message implies the DBA burned ~1M XIDs, all
in single-user mode.  More plausible is FreezeMultiXactId() overrunning the
limit by tens of segments.  Hence, if we do buy this insurance, let's skip far
more segments.  For example, instead of unlinking segments representing up to
2^31 past XIDs, we could divide that into an upper half that we unlink and a
lower half.  The lower half will stay in place; eventually, XID consumption
will overwrite it.  Truncation behavior won't change until the region of CLOG
for pre-oldestXact XIDs exceeds 256 MiB.  Beyond that threshold,
vac_truncate_clog() will unlink the upper 256 MiB and leave the rest.  CLOG
maximum would rise from 512 MiB to 768 MiB.  Would that be worthwhile?

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Tom Lane

Date:

06 April 2020, 18:46:09

Noah Misch <noah@leadboat.com> writes:
> On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
>> So I think what we're actually trying to accomplish here is to
>> ensure that instead of deleting up to half of the SLRU space
>> before the cutoff, we delete up to half-less-one-segment.
>> Maybe it should be half-less-two-segments, just to provide some
>> cushion against edge cases.  Reading the first comment in
>> SetTransactionIdLimit makes one not want to trust too much in
>> arguments based on the exact value of xidWrapLimit, while for
>> the other SLRUs it was already unclear whether the edge cases
>> were exactly right.

> That could be interesting insurance.  While it would be sad for us to miss an
> edge case and print "must be vacuumed within 2 transactions" when wrap has
> already happened, reaching that message implies the DBA burned ~1M XIDs, all
> in single-user mode.  More plausible is FreezeMultiXactId() overrunning the
> limit by tens of segments.  Hence, if we do buy this insurance, let's skip far
> more segments.  For example, instead of unlinking segments representing up to
> 2^31 past XIDs, we could divide that into an upper half that we unlink and a
> lower half.  The lower half will stay in place; eventually, XID consumption
> will overwrite it.  Truncation behavior won't change until the region of CLOG
> for pre-oldestXact XIDs exceeds 256 MiB.  Beyond that threshold,
> vac_truncate_clog() will unlink the upper 256 MiB and leave the rest.  CLOG
> maximum would rise from 512 MiB to 768 MiB.  Would that be worthwhile?

Hmm.  I'm not particularly concerned about the disk-space-consumption
angle, but I do wonder about whether we'd be sacrificing the ability to
recover cleanly from a situation that the code does let you get into.
However, as long as we're sure that the system will ultimately reuse/
recycle a not-deleted old segment file without complaint, it's hard to
find much fault with your proposal.  Temporarily wasting some disk
space is a lot more palatable than corrupting data, and these code
paths are necessarily not terribly well tested.  So +1 for more
insurance.

            regards, tom lane

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Noah Misch

Date:

07 April 2020, 04:18:47

On Mon, Apr 06, 2020 at 02:46:09PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
> >> So I think what we're actually trying to accomplish here is to
> >> ensure that instead of deleting up to half of the SLRU space
> >> before the cutoff, we delete up to half-less-one-segment.
> >> Maybe it should be half-less-two-segments, just to provide some
> >> cushion against edge cases.  Reading the first comment in
> >> SetTransactionIdLimit makes one not want to trust too much in
> >> arguments based on the exact value of xidWrapLimit, while for
> >> the other SLRUs it was already unclear whether the edge cases
> >> were exactly right.
> 
> > That could be interesting insurance.  While it would be sad for us to miss an
> > edge case and print "must be vacuumed within 2 transactions" when wrap has
> > already happened, reaching that message implies the DBA burned ~1M XIDs, all
> > in single-user mode.  More plausible is FreezeMultiXactId() overrunning the
> > limit by tens of segments.  Hence, if we do buy this insurance, let's skip far
> > more segments.  For example, instead of unlinking segments representing up to
> > 2^31 past XIDs, we could divide that into an upper half that we unlink and a
> > lower half.  The lower half will stay in place; eventually, XID consumption
> > will overwrite it.  Truncation behavior won't change until the region of CLOG
> > for pre-oldestXact XIDs exceeds 256 MiB.  Beyond that threshold,
> > vac_truncate_clog() will unlink the upper 256 MiB and leave the rest.  CLOG
> > maximum would rise from 512 MiB to 768 MiB.  Would that be worthwhile?
> 
> Hmm.  I'm not particularly concerned about the disk-space-consumption
> angle, but I do wonder about whether we'd be sacrificing the ability to
> recover cleanly from a situation that the code does let you get into.
> However, as long as we're sure that the system will ultimately reuse/
> recycle a not-deleted old segment file without complaint, it's hard to
> find much fault with your proposal.

That is the trade-off.  By distancing ourselves from the wraparound edge
cases, we'll get more segment recycling (heretofore an edge case).
Fortunately, recycling doesn't change behavior as you approach some limit; it
works or it doesn't.

> Temporarily wasting some disk
> space is a lot more palatable than corrupting data, and these code
> paths are necessarily not terribly well tested.  So +1 for more
> insurance.

Okay, I'll give that a try.  I expect this will replace the PagePrecedes
callback with a PageDiff callback such that PageDiff(a, b) < 0 iff
PagePrecedes(a, b).  PageDiff callbacks shall distribute return values
uniformly in [INT_MIN,INT_MAX].  SimpleLruTruncate() will unlink segments
where INT_MIN/2 < PageDiff(candidate, cutoff) < 0.

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Noah Misch

Date:

25 May 2020, 07:00:33

On Mon, Apr 06, 2020 at 09:18:47PM -0700, Noah Misch wrote:
> On Mon, Apr 06, 2020 at 02:46:09PM -0400, Tom Lane wrote:
> > Noah Misch <noah@leadboat.com> writes:
> > > On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
> > >> So I think what we're actually trying to accomplish here is to
> > >> ensure that instead of deleting up to half of the SLRU space
> > >> before the cutoff, we delete up to half-less-one-segment.
> > >> Maybe it should be half-less-two-segments, just to provide some
> > >> cushion against edge cases.  Reading the first comment in
> > >> SetTransactionIdLimit makes one not want to trust too much in
> > >> arguments based on the exact value of xidWrapLimit, while for
> > >> the other SLRUs it was already unclear whether the edge cases
> > >> were exactly right.
> > 
> > > That could be interesting insurance.  While it would be sad for us to miss an
> > > edge case and print "must be vacuumed within 2 transactions" when wrap has
> > > already happened, reaching that message implies the DBA burned ~1M XIDs, all
> > > in single-user mode.  More plausible is FreezeMultiXactId() overrunning the
> > > limit by tens of segments.  Hence, if we do buy this insurance, let's skip far
> > > more segments.  For example, instead of unlinking segments representing up to
> > > 2^31 past XIDs, we could divide that into an upper half that we unlink and a
> > > lower half.  The lower half will stay in place; eventually, XID consumption
> > > will overwrite it.  Truncation behavior won't change until the region of CLOG
> > > for pre-oldestXact XIDs exceeds 256 MiB.  Beyond that threshold,
> > > vac_truncate_clog() will unlink the upper 256 MiB and leave the rest.  CLOG
> > > maximum would rise from 512 MiB to 768 MiB.  Would that be worthwhile?

> > Temporarily wasting some disk
> > space is a lot more palatable than corrupting data, and these code
> > paths are necessarily not terribly well tested.  So +1 for more
> > insurance.
> 
> Okay, I'll give that a try.  I expect this will replace the PagePrecedes
> callback with a PageDiff callback such that PageDiff(a, b) < 0 iff
> PagePrecedes(a, b).  PageDiff callbacks shall distribute return values
> uniformly in [INT_MIN,INT_MAX].  SimpleLruTruncate() will unlink segments
> where INT_MIN/2 < PageDiff(candidate, cutoff) < 0.

While doing so, I found that slru-truncate-modulo-v2.patch did get edge cases
wrong, as you feared.  In particular, if the newest XID reached xidStopLimit
and was in the first page of a segment, TruncateCLOG() would delete its
segment.  Attached slru-truncate-modulo-v3.patch fixes that; as restitution, I
added unit tests covering that and other scenarios.  Reaching the bug via XIDs
was hard, requiring one to burn 1000k-CLOG_XACTS_PER_PAGE=967k XIDs in
single-user mode.  I expect the bug was easier to reach via pg_multixact.

The insurance patch stacks on top of the bug fix patch.  It does have a
negative effect on TruncateMultiXact(), which uses SlruScanDirCbFindEarliest
to skip truncation in corrupted clusters.  SlruScanDirCbFindEarliest() gives
nonsense answers if "future" segments exist.  That can happen today, but the
patch creates new ways to make it happen.  The symptom is wasting yet more
space in pg_multixact.  I am okay with this, since it arises only after one
fills pg_multixact 50% full.  There are alternatives.  We could weaken the
corruption defense in TruncateMultiXact() or look for another implementation
of equivalent defense.  We could unlink, say, 75% or 95% of the "past" instead
of 50% (this patch) or >99.99% (today's behavior).

Thanks,
nm

Attachment

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Andrey Borodin

Date:

01 January 2021, 18:05:29

Hi Noah!

I've found this thread in CF looking for something to review.

> 9 нояб. 2020 г., в 09:53, Noah Misch <noah@leadboat.com> написал(а):
>
> Rebased both patches, necessitated by commit c732c3f (a repair of commit
> dee663f).  As I mentioned on another branch of the thread, I'd be content if
> someone reviews the slru-truncate-modulo patch and disclaims knowledge of the
> slru-truncate-insurance patch; I would then abandon the latter patch.
> <slru-truncate-modulo-v5.patch><slru-truncate-t-insurance-v4.patch>

Commit c732c3f adds some SYNC_FORGET_REQUESTs.
slru-truncate-modulo-v5.patch fixes off-by-one error in functions like *PagePrecedes(int page1, int page2).
slru-truncate-t-insurance-v4.patch ensures that off-by-one errors do not inflict data loss.

While I agree that fixing error is better than hiding it, I could not figure out how c732c3f is connected to these
patches.
Can you please give me few pointers how to understand this connection?

Best regards, Andrey Borodin.

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Andrey Borodin

Date:

02 January 2021, 07:31:45


> 2 янв. 2021 г., в 01:35, Noah Misch <noah@leadboat.com> написал(а):
> There's no
> other connection to this thread, and one can review patches on this thread
> without studying commit c732c3f.

OK, thanks!

Do I understand correctly that this is bugfix that needs to be back-patched?
Thus we should not refactor 4 identical *PagePrecedes(int page1, int page2) into 1 generic function?
Since functions are not symmetric anymore, maybe we should have better names for arguments than "page1" and "page2"? At
leastin dev branch. 

Is it common practice to embed tests into assert checking like in SlruPagePrecedesUnitTests()?

SLRU seems no near simple, BTW. The only simple place is naive caching algorithm. I remember there was a thread to do
relationsfrom SLRUs. 

Best regards, Andrey Borodin.

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Andrey Borodin

Date:

06 January 2021, 06:28:36


> 1 янв. 2021 г., в 23:05, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> I've found this thread in CF looking for something to review.

We discussed patches with Noah offlist. I'm resending summary to list.

There are two patches:
1. slru-truncate-modulo-v5.patch
2. slru-truncate-t-insurance-v4.patch

It would be a bit easier to review if patches were versioned together (v5 both), because 2nd patch applies on top of
1st.Also 2nd patch have a problem if applying with git (in async.c). 

First patch fixes a bug with possible SLRU truncation around wrapping point too early.
Basic idea of the patch is "If we want to delete a range we must be eligible to delete it's beginning and ending".
So to test if page is deletable it tests that first and last xids(or other SLRU's unit) are of no interest. To test if
asegment is deletable it tests if first and last pages can be deleted. 
Patch adds test in unusual manner: they are implemented as assert functions. Tests are fast, they are only checking
basicand edge cases. But tests will not be run if Postgres is build without asserts. 
I'm a little suspicious of implementation of *PagePrecedes(int page1, int page2) functions. Consider following example
fromthe patch: 

static bool
CommitTsPagePrecedes(int page1, int page2)
{
    TransactionId xid1;
    TransactionId xid2;

    xid1 = ((TransactionId) page1) * COMMIT_TS_XACTS_PER_PAGE;
    xid1 += FirstNormalTransactionId + 1;
    xid2 = ((TransactionId) page2) * COMMIT_TS_XACTS_PER_PAGE;
    xid2 += FirstNormalTransactionId + 1;

    return (TransactionIdPrecedes(xid1, xid2) &&
            TransactionIdPrecedes(xid1, xid2 + COMMIT_TS_XACTS_PER_PAGE - 1));
}

We are adding FirstNormalTransactionId to xids to avoid them being special xids.
We add COMMIT_TS_XACTS_PER_PAGE - 1 to xid2 to shift it to the end of the page. But due to += FirstNormalTransactionId
weshift slightly behind page boundaries and risk that xid2 + COMMIT_TS_XACTS_PER_PAGE - 1 can become
FrozenTransactionId(FirstNormalTransactionId - 1). Thus we add +1 to all values in scope. While the logic is correct,
codingis difficult. Maybe we could just use 
    page1_first_normal_xid = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE + FirstNormalTransactionId;
    page2_first_normal_xid = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE + FirstNormalTransactionId;
    page2_last_xid = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE + CLOG_XACTS_PER_PAGE - 1;
But I'm not insisting on this.


Following comment is not correct for 1Kb and 4Kb pages
+ * At every supported BLCKSZ, (1 << 31) % COMMIT_TS_XACTS_PER_PAGE == 128.

All above notes are not blocking the fix, I just wanted to let committer know about this. I think that it's very
importantto have this bug fixed. 

Second patch converts binary *PagePreceeds() functions to *PageDiff()s and adds logic to avoid deleting pages in
suspiciouscases. This logic depends on the scale of returned by diff value: it expects that overflow happens between
INT_MIN\INT_MAX.This it prevents page deletion if page_diff <= INT_MIN / 2 (too far from current cleaning point; and in
normalcases, of cause).  

It must be comparison here, not equality test.
-   ctl->PagePrecedes(segpage, trunc->earliestExistingPage))
+   ctl->PageDiff(segpage, trunc->earliestExistingPage))

This
int diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1,
seems to be functional equivalent of
int diff_max = ((QUEUE_MAX_PAGE - 1) / 2),

What I like about the patch is that it removes all described above trickery around + FirstNormalTransactionId + 1.

AFAICS the overall purpose of the 2nd patch is to help corrupted by other bugs clusters avoid deleting SLRU segments.
I'm a little bit afraid that this kind of patch can hide bugs (while potentially saving some users data). Besides this
patchseems like a useful precaution. Maybe we could emit scary warnings if SLRU segments do not stack into continuous
range?


Thanks!

Best regards, Andrey Borodin.

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Andrey Borodin

Date:

09 January 2021, 15:25:39


> 9 янв. 2021 г., в 15:17, Noah Misch <noah@leadboat.com> написал(а):
>
>> This
>> int diff_max = ((QUEUE_MAX_PAGE + 1) / 2) - 1,
>> seems to be functional equivalent of
>> int diff_max = ((QUEUE_MAX_PAGE - 1) / 2),
>
> Do you think one conveys the concept better than the other?
I see now that next comments mention "(QUEUE_MAX_PAGE+1)/2", so I think there is no need to change something in a patch
here.

>> I'm a little bit afraid that this kind of patch can hide bugs (while potentially saving some users data). Besides
thispatch seems like a useful precaution. Maybe we could emit scary warnings if SLRU segments do not stack into
continuousrange? 
>
> Scary warnings are good for an observation that implies a bug, but the
> slru-truncate-t-insurance patch causes such an outcome in non-bug cases where
> it doesn't happen today.  In other words, discontinuous ranges of SLRU
> segments would be even more common after that patch.  For example, it would
> happen anytime oldestXID advances by more than ~1B at a time.

Uhm, I thought that if there is going to be more than ~1B xids - we are going to keep all segements forever and range
stillwill be continuous. Or am I missing something? 

Thanks!

Best regards, Andrey Borodin.

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Andrey Borodin

Date:

10 January 2021, 06:44:14


> 10 янв. 2021 г., в 03:15, Noah Misch <noah@leadboat.com> написал(а):
>
> No; it deletes the most recent ~1B and leaves the older segments.  An
> exception is multixact, as described in the commit message and the patch's
> change to a comment in TruncateMultiXact().

Thanks for clarification.
One more thing: retention point at 3/4 of overall space (half of wraparound) seems more or less random to me. Why not
5/8or 9/16? 

Can you please send revised patches with fixes?

Thanks!

Best regards, Andrey Borodin.

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Andrey Borodin

Date:

11 January 2021, 06:22:05


> 10 янв. 2021 г., в 14:43, Noah Misch <noah@leadboat.com> написал(а):
>
>> Can you please send revised patches with fixes?
>
> Attached.
> <slru-truncate-modulo-v6.patch>
> <slru-truncate-t-insurance-v5.patch>

I'm marking patch as ready for committer.

I can't tell should we backpatch insurance patch or not: it potentially fixes unknown bugs, and potentially contains
unknownbugs. I can't reason because of such uncertainty. I've tried to look for any potential problem and as for now I
seenone. Chances are <slru-truncate-t-insurance-v5.patch> is doing code less error-prone. 

Fix <slru-truncate-modulo-v6.patch> certainly worth backpatching.

Thanks!

Best regards, Andrey Borodin.

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From

Andrey Borodin

Date:

12 January 2021, 16:41:19


> 12 янв. 2021 г., в 13:49, Noah Misch <noah@leadboat.com> написал(а):
>
> What do you think of abandoning slru-truncate-t-insurance entirely?  As of
> https://postgr.es/m/20200330052809.GB2324620@rfd.leadboat.com I liked the idea
> behind it, despite its complicating the system for hackers and DBAs.  The
> TruncateMultiXact() interaction rendered it less appealing.  In v14+, commit
> cd5e822 mitigates the kind of bugs that slru-truncate-t-insurance mitigates,
> further reducing the latter's value.  slru-truncate-t-insurance does mitigate
> larger trespasses into unlink-eligible space, though.
I seem to me that not committing an insurance patch is not a mistake. Let's abandon slru-truncate-t-insurance for now.

>> Fix <slru-truncate-modulo-v6.patch> certainly worth backpatching.
>
> I'll push it on Saturday, probably.
Thanks!

Best regards, Andrey Borodin.