Re: XLogReadRecord() error in XlogReadTwoPhaseData() - Mailing list pgsql-hackers

From Noah Misch
Subject Re: XLogReadRecord() error in XlogReadTwoPhaseData()
Date
Msg-id 20220122185241.GA1095169@rfd.leadboat.com
Whole thread Raw
In response to Re: XLogReadRecord() error in XlogReadTwoPhaseData()  (Noah Misch <noah@leadboat.com>)
Responses Re: XLogReadRecord() error in XlogReadTwoPhaseData()
List pgsql-hackers
On Sun, Jan 16, 2022 at 01:02:41PM -0800, Noah Misch wrote:
> My next steps:
> 
> - Report a Debian bug for the sparc64+ext4 zeros problem.

(Not done yet.)

> - Try to falsify the idea that "write only the not-already-written portion of
>   a WAL block" is an effective workaround.  Specifically, modify the test
>   program to have the writer process mutate offsets [N-k,N-1] and [N+1,N+k]
>   while the reader process reads offset N.  If the reader sees a zero, that
>   workaround is ineffective.

The reader did not see a zero.  In addition to bytes outside the write being
immune to the zeros bug, the first and last forty bytes of a write were immune
to the zeros bug.

> - Implement the workaround, if I didn't falsify its effectiveness.  If it
>   doesn't hurt performance on x86_64, we can use it unconditionally.
>   Otherwise, limit its use to sparc64 Linux.

Attached.  With this, kittiwake has survived 8.5hr of 003_cic_2pc.pl.  Without
the patch, it failed many times, always within 1.3hr.  For easier review, this
patch uses the new behavior on all platforms.  Before commit and back-patch, I
plan to limit use of the new behavior to sparc Linux.  Future work can
benchmark the new behavior and, if it performs well, make it unconditional in
v15+.  I would expect performance to be unchanged or slightly better, because
the new behavior requests less futile work from the OS.

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: relcache not invalidated when ADD PRIMARY KEY USING INDEX
Next
From: Peter Geoghegan
Date:
Subject: Re: autovacuum prioritization