Re: pgsql: Add contrib/pg_walinspect. - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: pgsql: Add contrib/pg_walinspect.
Date
Msg-id CA+hUKG+H_VEBdtK4CVb7uRLaAKbufNOMy-djUsptcqhLxONMmA@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: Add contrib/pg_walinspect.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pgsql: Add contrib/pg_walinspect.  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: pgsql: Add contrib/pg_walinspect.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Apr 27, 2022 at 12:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > I think it's a bug in pg_walinspect, so I'll move the discussion back
> > here.  Here's one rather simple way to fix it, that has survived
> > running the test a thousand times (using a recipe that failed for me
> > quite soon, after 20-100 attempts or so; I never figured out how to
> > get the 50% failure rate reported by Tom).
>
> Not sure what we're doing differently, but plain "make check" in
> contrib/pg_walinspect fails pretty consistently for me on gcc23.
> I tried it again just now and got five failures in five attempts.

I tried on the /home filesystem (a slow NFS mount) and then inside a
directory on /tmp to get ext4 (I saw that Noah had somehow got onto a
local filesystem, based on the present of "ext4" in the pathname and I
was trying everything I could think of).  I used what I thought might
be some relevant starter configure options copied from the animal:

./configure --prefix=$HOME/install --enable-cassert --enable-debug
--enable-tap-tests CC="ccache gcc -mips32r2" CFLAGS="-O2
-funwind-tables" LDFLAGS="-rdynamic"

For me, make check always succeeds in contrib/pg_walinspect.  For me,
make installcheck fails if I do it enough times in a loop, somewhere
around the 20th loop or so, which I imagine has to do with WAL page
boundaries moving around.

for i in `seq 1 1000` ; do
  make -s installcheck || exit 1
done

> I then installed your patch and got the same failure, three times
> out of three, so I don't think we're there yet.

Hrmph...  Are you sure you rebuilt the contrib module?   Assuming so,
maybe it's failing in a different way for you and me.  For me, it
always fails after this break is reached in xlogutil.c:

            /* If asked, let's not wait for future WAL. */
            if (!wait_for_wal)
                break;

If you add a log message there, do you see that?  For me, the patch
fixes it, because it teaches pg_walinspect that messageless errors are
a way of detecting end-of-data (due to the code above, introduced by
the pg_walinspect commit).



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql: Add contrib/pg_walinspect.
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL