Re: Should walsernder check correctness of WAL records? - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Should walsernder check correctness of WAL records?
Date
Msg-id 14e95078-5fde-c784-20c7-dda4bc399d37@postgrespro.ru
Whole thread Raw
In response to Re: Should walsernder check correctness of WAL records?  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers

On 02.10.2020 3:28, Michael Paquier wrote:
> On Fri, Oct 02, 2020 at 12:16:25AM +0000, tsunakawa.takay@fujitsu.com wrote:
>> IIUC, walsender tries hard to send WAL as fast as possible to reduce
>> replication lag and transaction response time, so it doesn't try to
>> peek each WAL record.  I think it's good.
> CRC calculation would unlikely be the bottleneck here, no?  I would
> assume that the extra lseek() calls needed to look after the record
> data to be more harmful.
When do we need to perform some lseeks?
wal-sender and wal-receiver are dealing just with raw sequences of bytes.
Them do not try to split input stream into WAL records.
If we have to process input data using wal-reader, then I afraid it will 
itself add quite noticeable overhead.
Using standard wal reader seems to be very inefficient in this case, 
because it performs unpacking of WAL records.
We do not need it: the only requires thing is to extract WAL record 
length from the header and calculate CRC.
The main difficulty is that WAl record can occupy several pages, so we 
need to accumulate checksum somewhere
and  seek backward to the beginning of the record once we found  CRC 
mismatch.


>> In any case, the WAL can get corrupt during transmission, and
>> writing and reading on the standby.  So, the standby needs to check
>> the WAL record CRC.
> Yep.  However, I would worry much more about the case of cold
> archives.  In my experience, there are higher risks to get a WAL
> segment corrupted because it was on disk and that this disk got
> corrupted.  Transmission is a one-time short operation. Cold archives
> could stay on disk for weeks before getting reused in WAL replay.
> --
> Michael

So right now neither wal-sender, neither wal-receiver are checking CRC.
We check records only when applying them.
But it seems to be too late for correct recovery.

As far as wal-sender adjust replication slot position according to the 
flush position at replica,
at the moment when we detect corrupted record restart lsn position can 
be already set after this  record.
Even of we perform WAL archiving and fortunately this archive contains 
correct (not corrupted) WAL segment,
we will have to copy this WAL segment not only to master but also to all 
replicas.
is it acceptable?


So I am not sure whether earlier CRC mismatch detection can help us to 
recover this error.
And isn't price for it too high?

I wonder what other actions we can perform at master or at replica to 
handle this situation?
For example, if we detect record corruption at WAL-sender and corrupted 
records contains FPW,
we can try to replace image of the buffer in the record with current 
page image.
But it is only possible if page was not changed since this WAL record 
was created.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Error code missing for "wrong length of inner sequence" error
Next
From: James Coleman
Date:
Subject: Re: enable_incremental_sort changes query behavior