Re: Corruption during WAL replay - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Corruption during WAL replay
Date
Msg-id 20220325045438.enwakjqhrafzq5f2@alap3.anarazel.de
Whole thread Raw
In response to Re: Corruption during WAL replay  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Corruption during WAL replay  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Corruption during WAL replay  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On 2022-03-25 00:08:20 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > The only thing I can really conclude here is that we apparently end up with
> > the same checksum for exactly the modifications we are doing? Just on those
> > two damn instances? Reliably?
> 
> IIRC, the table's OID or relfilenode enters into the checksum.
> Could it be that assigning a specific OID to the table allows
> this to happen, and these two animals are somehow assigning
> that OID while others are using some slightly different OID?

It's just the block number that goes into it.

I do see that the LSN that ends up on the page is the same across a few runs
of the test on serinus. Which presumably differs between different
animals. Surprised that it's this predictable - but I guess the run is short
enough that there's no variation due to autovacuum, checkpoints etc.

If I add a 'SELECT txid_current()' before the CREATE TABLE in
check_relation_corruption(), the test doesn't fail anymore, because there's an
additional WAL record.

16bit checksums for the win.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Is there any documentation on how to correctly create extensions in HA(primary-standby) setup?
Next
From: Michael Paquier
Date:
Subject: Re: Assert in pageinspect with NULL pages