Re: A failure in 031_recovery_conflict.pl on Debian/s390x - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: A failure in 031_recovery_conflict.pl on Debian/s390x
Date
Msg-id CA+hUKGJs8mskHt=38dFQYkucv0H44xTy=EDF0=D0sGuJms3DBw@mail.gmail.com
Whole thread Raw
In response to Re: A failure in 031_recovery_conflict.pl on Debian/s390x  (Christoph Berg <myon@debian.org>)
Responses Re: A failure in 031_recovery_conflict.pl on Debian/s390x
List pgsql-hackers
On Thu, Aug 10, 2023 at 9:15 PM Christoph Berg <myon@debian.org> wrote:
> No XXX lines this time either, but I've seen then im logfiles that
> went through successfully.

Hmm.  Well, I think this looks like a different kind of bug then.
That patch of mine is about fixing some unsafe coding on the receiving
side of a signal.  In this case it's apparently not being sent.  So
either the Heap2/PRUNE record was able to proceed (indicating that
that CURSOR was not holding a pin as expected), or VACUUM decided not
to actually do anything to that block (conditional cleanup lock vs
transient pin changing behaviour?), or there's a bug somewhere in/near
LockBufferForCleanup(), which should have emitted that XXX message
before even calling ResolveRecoveryConflictWithBufferPin().

Do you still have the data directories around from that run, so we can
see if the expected Heap2/PRUNE was actually logged?  For example
(using meson layout here, in the build directory) that'd be something
like:

$ ./tmp_install/home/tmunro/install/bin/pg_waldump
testrun/recovery/031_recovery_conflict/data/t_031_recovery_conflict_standby_data/pgdata/pg_wal/000000010000000000000003

In there I see this:

rmgr: Heap2       len (rec/tot):     57/    57, tx:          0, lsn:
0/0344BB90, prev 0/0344BB68, desc: PRUNE snapshotConflictHorizon: 0,
nredirected: 0, ndead: 1, nunused: 0, redirected: [], dead: [21],
unused: [], blkref #0: rel 1663/16385/16386 blk 0

That's the WAL record that's supposed to be causing
031_recovery_conflict_standby.log to talk about a conflict, starting
with this:

2023-08-10 22:47:04.564 NZST [57145] LOG:  recovery still waiting
after 10.035 ms: recovery conflict on buffer pin
2023-08-10 22:47:04.564 NZST [57145] CONTEXT:  WAL redo at 0/344BB90
for Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 1,
 nunused: 0, redirected: [], dead: [21], unused: []; blkref #0: rel
1663/16385/16386, blk 0



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PATCH] Add loongarch native checksum implementation.
Next
From: John Naylor
Date:
Subject: Re: [PATCH] Add loongarch native checksum implementation.