Home > mailing lists

Re: [patch] Fix checksum verification in base backups for zero page headers - Mailing list pgsql-hackers

From	Anastasia Lubennikova
Subject	Re: [patch] Fix checksum verification in base backups for zero page headers
Date	September 2, 2020 13:50:14
Msg-id	d041090c-ddf7-10d1-7d50-2a4ed03c53dd@postgrespro.ru Whole thread Raw
In response to	[patch] Fix checksum verification in base backups for zero page headers (Michael Banck <michael.banck@credativ.de>)
Responses	Re: [patch] Fix checksum verification in base backups for zero page headers Re: [patch] Fix checksum verification in base backups for zero page headers
List	pgsql-hackers

Tree view

On 01.09.2020 13:22, Michael Banck wrote:
> Hi,
>
> as a continuation of [1], I've split-off the zero page header case from
> the last patch, as this one seems less contentious.
>
>
> Michael
>
> [1] https://commitfest.postgresql.org/28/2308/
>
I've looked through the previous discussion. As far as I got it, most of 
the controversy was about online checksums improvements.

The warning about pd_upper inconsistency that you've added is a good 
addition. The patch is a bit messy, though, because a huge code block 
was shifted.

Will it be different, if you just leave
     "if (!PageIsNew(page) && PageGetLSN(page) < startptr)"
block as it was, and add
     "else if (PageIsNew(page) && !PageIsZero(page))" ?

While on it, I also have a few questions about the code around:

1) Maybe we can use other header sanity checks from PageIsVerified() as 
well? Or even the function itself.

2) > /* Reset loop to validate the block again */
How many times do we try to reread the block? Is one reread enough?
Speaking of which, 'reread_cnt' name looks confusing to me. I would 
expect that this variable contains a number of attempts, not the number 
of bytes read.

If everyone agrees, that for basebackup purpose it's enough to rely on a 
single reread, I'm ok with it.
Another approach is to read the page directly from shared buffers to 
ensure that the page is fine. This way is more complicated, but should 
be almost bullet-proof. Using it we can also check pages with lsn >= 
startptr.

3) Judging by warning messages, we count checksum failures per file, not 
per page, and don't report after a fifth failure. Why so?  Is it a 
common case that so many pages of one file are corrupted?

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

From: Alvaro Herrera
Date: 02 September 2020, 13:49:54
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: John Naylor
Date: 02 September 2020, 13:55:48
Subject: Re: Problems with the FSM, heap fillfactor, and temporal locality

Re: [patch] Fix checksum verification in base backups for zero page headers - Mailing list pgsql-hackers

Previous

Next