Re: Testing autovacuum wraparound (including failsafe) - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Testing autovacuum wraparound (including failsafe)
Date
Msg-id CAD21AoAyYBZOiB1UPCPZJHTLk0-arrq5zqNGj+PrsbpdUy=g-g@mail.gmail.com
Whole thread Raw
In response to Re: Testing autovacuum wraparound (including failsafe)  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Testing autovacuum wraparound (including failsafe)  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Wed, Mar 8, 2023 at 1:52 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Mar 3, 2023 at 8:34 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >
> > On 16/11/2022 06:38, Ian Lawrence Barwick wrote:
> > > Thanks for the patch. While reviewing the patch backlog, we have determined that
> > > the latest version of this patch was submitted before meson support was
> > > implemented, so it should have a "meson.build" file added for consideration for
> > > inclusion in PostgreSQL 16.
> >
> > I wanted to do some XID wraparound testing again, to test the 64-bit
> > SLRUs patches [1], and revived this.
>
> Thank you for reviving this thread!
>
> >
> > I took a different approach to consuming the XIDs. Instead of setting
> > nextXID directly, bypassing GetNewTransactionId(), this patch introduces
> > a helper function to call GetNewTransactionId() repeatedly. But because
> > that's slow, it does include a shortcut to skip over "uninteresting"
> > XIDs. Whenever nextXid is close to an SLRU page boundary or XID
> > wraparound, it calls GetNewTransactionId(), and otherwise it bumps up
> > nextXid close to the next "interesting" value. That's still a lot slower
> > than just setting nextXid, but exercises the code more realistically.
> >
> > I've written some variant of this helper function many times over the
> > years, for ad hoc testing. I'd love to have it permanently in the git tree.
>
> These functions seem to be better than mine.
>
> > In addition to Masahiko's test for emergency vacuum, this includes two
> > other tests. 002_limits.pl tests the "warn limit" and "stop limit" in
> > GetNewTransactionId(), and 003_wraparound.pl burns through 10 billion
> > transactions in total, exercising XID wraparound in general.
> > Unfortunately these tests are pretty slow; the tests run for about 4
> > minutes on my laptop in total, and use about 20 GB of disk space. So
> > perhaps these need to be put in a special test suite that's not run as
> > part of "check-world". Or perhaps leave out the 003_wraparounds.pl test,
> > that's the slowest of the tests. But I'd love to have these in the git
> > tree in some form.
>
> cbfot reports some failures. The main reason seems that meson.build in
> xid_wraparound directory adds the regression tests but the .sql and
> .out files are missing in the patch. Perhaps the patch wants to add
> only tap tests as Makefile doesn't define REGRESS?
>
> Even after fixing this issue, CI tests (Cirrus CI) are not happy and
> report failures due to a disk full. The size of xid_wraparound test
> directory is 105MB out of 262MB:
>
> % du -sh testrun
> 262M    testrun
> % du -sh testrun/xid_wraparound/
> 105M    testrun/xid_wraparound/
> % du -sh testrun/xid_wraparound/*
> 460K    testrun/xid_wraparound/001_emergency_vacuum
> 93M     testrun/xid_wraparound/002_limits
> 12M     testrun/xid_wraparound/003_wraparounds
> % ls -lh testrun/xid_wraparound/002_limits/log*
> total 93M
> -rw-------. 1 masahiko masahiko 93M Mar  7 17:34 002_limits_wraparound.log
> -rw-rw-r--. 1 masahiko masahiko 20K Mar  7 17:34 regress_log_002_limits
>
> The biggest file is the server logs since an autovacuum worker writes
> autovacuum logs for every table for every second (autovacuum_naptime
> is 1s). Maybe we can set log_autovacuum_min_duration reloption for the
> test tables instead of globally enabling it

I think it could be acceptable since 002 and 003 tests are executed
only when required. And 001 test seems to be able to pass on cfbot but
it takes more than 30 sec. In the attached patch, I made these tests
optional and these are enabled if envar ENABLE_XID_WRAPAROUND_TESTS is
defined (supporting only autoconf).

>
> The 001 test uses the 2PC transaction that holds locks on tables but
> since we can consume xids while the server running, we don't need
> that. Instead I think we can keep a transaction open in the background
> like 002 test does.

Updated in the new patch. Also, I added a check if the failsafe mode
is triggered.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: "shiy.fnst@fujitsu.com"
Date:
Subject: RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Next
From: Amit Kapila
Date:
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher