On Mon, Feb 09, 2026 at 07:31:13AM +0000, PG Bug reporting form wrote:
> Primary db was not impacted, however standby node and DR site replication
> broken, I tried to reinit with latest backup + archive loading from
> pgbackrest backup but it fails with same error once the corrupt wal/archive
> file applying the changes. I had to reinit with pgbasebackup with 40TB
> database which took about 45 hrs of time.
>
> Looks like some RACE condition happend to WAL file that generate the issue.
> looks like potential bug of it.
Perhaps so. However, it is basically impossible to determine if this
is actually an issue without more information. Hence, one would need
more input about the workloads involved (concurrency included), the
pages touched, and the WAL patterns at least. The best thing possible
would be a reproducible self-contained test case, of course, which
could be used to evaluate the versions impacted and the potential
solutions. Race conditions like that with predefined WAL patterns
should be easy to reproduce with some injection points to force a
strict ordering of WAL record, particularly if this is a problem that
can be reproduced after a startup, where we just need to make sure
that a node is able to recover.
One thing that may matter, on top of my mind: does your backup setup
rely on the in-core incremental backups with some combined backups?
That could be a contributing factor, or not.
--
Michael