Re: Online checksums patch - once again - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Online checksums patch - once again
Date
Msg-id ef9bb1da-ba99-a59f-4102-56026813469f@iki.fi
Whole thread Raw
In response to Re: Online checksums patch - once again  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: Online checksums patch - once again  (Michael Paquier <michael@paquier.xyz>)
Re: Online checksums patch - once again  (Magnus Hagander <magnus@hagander.net>)
Re: Online checksums patch - once again  (Daniel Gustafsson <daniel@yesql.se>)
List pgsql-hackers
(I may have said this before, but) My overall high-level impression of 
this patch is that it's really cmmplex for a feature that you use maybe 
once in the lifetime of a cluster. I'm happy to review but I'm not 
planning to commit this myself. I don't object if some other committer 
picks this up (Magnus?).

Now to the latest patch version:

On 03/02/2021 18:15, Daniel Gustafsson wrote:
> The previous v35 had a tiny conflict in pg_class.h, the attached v36 (which is
> a squash of the 2 commits in v35) fixes that.  No other changes are introduced
> in this version.


>     /*
>      * Check to see if my copy of RedoRecPtr is out of date. If so, may have
>      * to go back and have the caller recompute everything. This can only
>      * happen just after a checkpoint, so it's better to be slow in this case
>      * and fast otherwise.
>      *
>      * Also check to see if fullPageWrites or forcePageWrites was just turned
>      * on, or if we are in the process of enabling checksums in the cluster;
>      * if we weren't already doing full-page writes then go back and recompute.
>      *
>      * If we aren't doing full-page writes then RedoRecPtr doesn't actually
>      * affect the contents of the XLOG record, so we'll update our local copy
>      * but not force a recomputation.  (If doPageWrites was just turned off,
>      * we could recompute the record without full pages, but we choose not to
>      * bother.)
>      */
>     if (RedoRecPtr != Insert->RedoRecPtr)
>     {
>         Assert(RedoRecPtr < Insert->RedoRecPtr);
>         RedoRecPtr = Insert->RedoRecPtr;
>     }
>     doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());

Why does this use DataChecksumsOnInProgress() instead of 
DataChecksumsNeedWrite()? If checksums are enabled, you always need 
full-page writes, don't you? If not, then why is it needed in the 
inprogress-on state?

We also set doPageWrites in InitXLOGAccess(). That should match the 
condition above (although it doesn't matter for correctness).

> /*
>  * DataChecksumsNeedVerify
>  *        Returns whether data checksums must be verified or not
>  *
>  * Data checksums are only verified if they are fully enabled in the cluster.
>  * During the "inprogress-on" and "inprogress-off" states they are only
>  * updated, not verified (see datachecksumsworker.c for a longer discussion).
>  *
>  * This function is intended for callsites which have read data and are about
>  * to perform checksum validation based on the result of this. To avoid the
>  * the risk of the checksum state changing between reading and performing the
>  * validation (or not), interrupts must be held off. This implies that calling
>  * this function must be performed as close to the validation call as possible
>  * to keep the critical section short. This is in order to protect against
>  * time of check/time of use situations around data checksum validation.
>  */
> bool
> DataChecksumsNeedVerify(void)
> {
>     Assert(InterruptHoldoffCount > 0);
>     return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
> }

What exactly is the intended call pattern here? Something like this?

smgrread() a data page
HOLD_INTERRUPTS();
if (DataChecksumsNeedVerify())
{
   if (pg_checksum_page((char *) page, blkno) != expected)
     elog(ERROR, "bad checksum");
}
RESUME_INTERRUPTS();

That seems to be what the code currently does. What good does holding 
interrupts do here? If checksums were not fully enabled at the 
smgrread() call, the page might have incorrect checksums, and if the 
state transitions from inprogress-on to on between the smggread() call 
and the DataChecksumsNeedVerify() call, you'll get an error. I think you 
need to hold the interrupts *before* the smgrread() call.

> /*
>  * Set checksum for a page in private memory.
>  *
>  * This must only be used when we know that no other process can be modifying
>  * the page buffer.
>  */
> void
> PageSetChecksumInplace(Page page, BlockNumber blkno)
> {
>     HOLD_INTERRUPTS();
>     /* If we don't need a checksum, just return */
>     if (PageIsNew(page) || !DataChecksumsNeedWrite())
>     {
>         RESUME_INTERRUPTS();
>         return;
>     }
> 
>     ((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
>     RESUME_INTERRUPTS();
> }

The checksums state might change just after this call, before the caller 
has actually performed the smgrwrite() or smgrextend() call. The caller 
needs to hold interrupts across this function and the 
smgrwrite/smgrextend() call. It is a bad idea to HOLD_INTERRUPTS() here, 
because that just masks bugs where the caller isn't holding the 
interrupts. Same in PageSetChecksumCopy().

- Heikki



pgsql-hackers by date:

Previous
From: torikoshia
Date:
Subject: Re: adding wait_start column to pg_locks
Next
From: Greg Nancarrow
Date:
Subject: Re: Libpq support to connect to standby server as priority