Re: Online enabling of checksums - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Online enabling of checksums
Date
Msg-id 6f8675d8-c4db-1fae-6de4-4a0857421c9b@iki.fi
Whole thread Raw
In response to Re: Online enabling of checksums  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: Online enabling of checksums
Re: Online enabling of checksums
List pgsql-hackers
Hi,

The patch looks good to me at a high level. Some notes below. I didn't 
read through the whole thread, so sorry if some of these have been 
discussed already.

> +void
> +ShutdownChecksumHelperIfRunning(void)
> +{
> +    if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
> +    {
> +        /* Launcher not started, so nothing to shut down */
> +        return;
> +    }
> +
> +    ereport(ERROR,
> +            (errmsg("checksum helper is currently running, cannot disable checksums"),
> +             errhint("Restart the cluster or wait for the worker to finish.")));
> +}

Is there no way to stop the checksum helper once it's started? That 
seems rather user-unfriendly. I can imagine it being a pretty common 
mistake to call pg_enable_data_checksums() on a 10 TB cluster, only to 
realize that you forgot to set the cost limit, and that it's hurting 
queries too much. At that point, you want to abort.

> + * This is intended to create the worklist for the workers to go through, and
> + * as we are only concerned with already existing databases we need to ever
> + * rebuild this list, which simplifies the coding.

I can't parse this sentence.

> +extern bool DataChecksumsEnabledOrInProgress(void);
> +extern bool DataChecksumsInProgress(void);
> +extern bool DataChecksumsDisabled(void);

I find the name of the DataChecksumsEnabledOrInProgress() function a bit 
long. And doing this in PageIsVerified looks a bit weird:

 > if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())

I think I'd prefer functions like:

/* checksums should be computed on write? */
bool DataChecksumNeedWrite()
/* checksum should be verified on read? */
bool DataChecksumNeedVerify()


> +     <literal>template0</literal> is by default not accepting connections, to
> +     enable checksums you'll need to temporarily make it accept connections.

This was already discussed, and I agree with the other comments that 
it's bad.

> +CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
> +        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
> +  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
> +  PARALLEL RESTRICTED;

pg_[enable|disable]_checksums() functions return a bool. It's not clear 
to me when they would return what. I'd suggest marking them as 'void' 
instead.

> --- a/src/include/storage/bufpage.h
> +++ b/src/include/storage/bufpage.h
> @@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
>   */
>  #define PG_PAGE_LAYOUT_VERSION        4
>  #define PG_DATA_CHECKSUM_VERSION    1
> +#define PG_DATA_CHECKSUM_INPROGRESS_VERSION        2
>  

This seems like a weird place for these PG_DATA_CHECKSUM_* constants. 
They're not actually stored in the page header, as you might think. I 
think the idea was to keep PG_DATA_CHECKSUM_VERSION close to 
PG_PAGE_LAYOUT_VERSION, because the checksums affected the on-disk 
format. I think it was a bit weird even before this patch, but now it's 
worse. At least a better comment would be in order, or maybe move these 
elsewhere.

Feature request: it'd be nice to update the 'ps status' with 
set_ps_display, to show a rudimentary progress indicator. Like the name 
and block number of the relation being processed. It won't tell you how 
much is left to go, but at least it will give a warm fuzzy feeling to 
the DBA that something is happening.

I didn't review the offline tool, pg_verify_checksums.

- Heikki



pgsql-hackers by date:

Previous
From: amul sul
Date:
Subject: Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key
Next
From: Amit Khandekar
Date:
Subject: Re: Hash join in SELECT target list expression keeps consuming memory