Home > mailing lists

Re: Corruption during WAL replay - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Corruption during WAL replay
Date	March 25, 2022 05:23:00
Msg-id	3192026.1648185780@sss.pgh.pa.us Whole thread Raw
In response to	Re: Corruption during WAL replay (Andres Freund <andres@anarazel.de>)
Responses	Re: Corruption during WAL replay
List	pgsql-hackers

Tree view

Andres Freund <andres@anarazel.de> writes:
> I do see that the LSN that ends up on the page is the same across a few runs
> of the test on serinus. Which presumably differs between different
> animals. Surprised that it's this predictable - but I guess the run is short
> enough that there's no variation due to autovacuum, checkpoints etc.

Uh-huh.  I'm not surprised that it's repeatable on a given animal.
What remains to be explained:

1. Why'd it start failing now?  I'm guessing that ce95c5437 *was* the
culprit after all, by slightly changing the amount of catalog data
written during initdb, and thus moving the initial LSN.

2. Why just these two animals?  If initial LSN is the critical thing,
then the results of "locale -a" would affect it, so platform
dependence is hardly surprising ... but I'd have thought that all
the animals on that host would use the same initial set of
collations.  OTOH, I see petalura and pogona just fell over too.
Do you have some of those animals --with-icu and others not?

> 16bit checksums for the win.

Yay :-(

As for a fix, would damaging more of the page help?  I guess
it'd just move around the one-in-64K chance of failure.
Maybe we have to intentionally corrupt (e.g. invert) the
checksum field specifically.

            regards, tom lane

pgsql-hackers by date:

From: Kyotaro Horiguchi
Date: 25 March 2022, 05:22:56
Subject: Re: shared-memory based stats collector - v66

From: "wangw.fnst@fujitsu.com"
Date: 25 March 2022, 05:23:05
Subject: RE: Logical replication timeout problem

Re: Corruption during WAL replay - Mailing list pgsql-hackers

Previous

Next