Re: pgsql: Validate page level checksums in base backups - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: pgsql: Validate page level checksums in base backups
Date
Msg-id 500ab687-d588-020a-0e23-bfec853aac9c@2ndquadrant.com
Whole thread Raw
In response to Re: pgsql: Validate page level checksums in base backups  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: pgsql: Validate page level checksums in base backups
List pgsql-hackers

On 04/10/2018 11:24 PM, Tomas Vondra wrote:
> Hi,
> 
> I think there's a bug in sendFile(). We do check checksums on all pages
> that pass this LSN check:
> 
>     /*
>      * Only check pages which have not been modified since the
>      * start of the base backup. Otherwise, they might have been
>      * written only halfway and the checksum would not be valid.
>      * However, replaying WAL would reinstate the correct page in
>      * this case.
>      */
>     if (PageGetLSN(page) < startptr)
>     {
>         ...
>     }
> 
> Now, imagine the page is new, i.e. all-zeroes. That means the LSN is 0/0
> too, and we'll try to verify the checksum - but we actually do not set
> checksums on empty pages.
> 
> So I think it should be something like this:
> 
>     if ((!PageIsNew(page)) && (PageGetLSN(page) < startptr))
>     {
>         ...
>     }
> 
> It might be worth verifying that the page is actually all-zeroes (and
> not just with corrupted pd_upper value. Not sure it's worth it.
> 
> I've found this by fairly trivial stress testing - running pgbench and
> pg_basebackup in a loop. It was failing pretty reliably (~75% of runs).
> With the proposed change I see no further failures.
> 

BTW pg_verify_checksums needs the same fix.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] Runtime Partition Pruning
Next
From: Peter Geoghegan
Date:
Subject: Re: Gotchas about pg_verify_checksums