Re: Corruption during WAL replay - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Corruption during WAL replay
Date
Msg-id 20220325060737.iayq5cs36jktqlag@alap3.anarazel.de
Whole thread Raw
In response to Re: Corruption during WAL replay  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Corruption during WAL replay  (Robert Haas <robertmhaas@gmail.com>)
Re: Corruption during WAL replay  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2022-03-25 01:38:45 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Not sure what to do here... I guess we can just change the value we overwrite
> > the page with and hope to not hit this again? But that feels deeply deeply
> > unsatisfying.
> 
> AFAICS, this strategy of whacking a predetermined chunk of the page with
> a predetermined value is going to fail 1-out-of-64K times.

Yea. I suspect that the way the modifications and checksumming are done are
actually higher chance than 1/64k. But even it actually is 1/64k, it's not
great to wait for (#animals * #catalog-changes) to approach a decent
percentage of 1/64k.


I'm was curious whether there have been similar issues in the past. Querying
the buildfarm logs suggests not, at least not in the pg_checksums test.


> We have to change the test so that it's guaranteed to produce an invalid
> checksum.  Inverting just the checksum field, without doing anything else,
> would do that ... but that feels pretty unsatisfying too.

We really ought to find a way to get to wider checksums :/

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Corruption during WAL replay
Next
From: Masahiko Sawada
Date:
Subject: Re: Logical replication timeout problem