Re: BUG #17928: Standby fails to decode WAL on termination of primary - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date
Msg-id ZRDNNf8Etlvuo48a@paquier.xyz
Whole thread Raw
In response to Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: BUG #17928: Standby fails to decode WAL on termination of primary
List pgsql-bugs
On Mon, Sep 25, 2023 at 09:02:35AM +1300, Thomas Munro wrote:
> I see there was a failure on 16 on the very slow AIX box, and I have
> access so looking into that...

Lucky you, if I may say ;)

A bunch of architectures that are not Intel are failing.  Here is a
summary based on the buildfarm reports:
topminnow, mips64el with gcc 4.9.2
mereswine, ARMv7 with gcc 10.2.1
sungazer, ppc64 with gcc 8.3.0
frogfish, mips64el with gcc 4.6.3
mamba, macppc with gcc 10.4.0
gull, ARMv7 with clang 13.0.0
grison, ARMv7 with gcc 4.6.3
copperhead, riscv64 with gcc 10.X

The only thing close to that I have close by is tanager on Armv7 (it
has not reported to the buildfarm for a few weeks as it has
overheated because of the summer here, but I've put it back online
now).  However, it has passed a few hundred cycles with both gcc and
clang yesterday, on top of having a clean buildfarm run.

With sungazer now failing on REL_16_STABLE, it feels to me that we are
actually looking at two bugs?  One on HEAD, and one in stable
branches?  For HEAD and the 2PC failure, the records up to PREPARE
TRANSACTION should be replayed by the standby getting promoted, but
I'd rather dig into that with a host that's able to report the
failure.

copperhead seems to be one of the failing hosts that's able to compile
things quickly.  Tom, Noah, or copperhead's owner, could it be
possible to get access to one of the hosts that are failing for more
investigation?  I would not do more than compiling the code and check
after the state of the 2PC test for this promotion failure.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #18132: llvm-jit does not build with LLVM 17
Next
From: Thomas Munro
Date:
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary