Re: Online checksums patch - once again - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Online checksums patch - once again
Date
Msg-id d89d50a9-06b8-dff6-d80d-9bd3a841216c@iki.fi
Whole thread Raw
In response to Re: Online checksums patch - once again  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: Online checksums patch - once again  (Daniel Gustafsson <daniel@yesql.se>)
List pgsql-hackers
On 17/11/2020 10:56, Daniel Gustafsson wrote:
>> On 5 Oct 2020, at 13:36, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> 2. The signaling between enable_data_checksums() and the launcher process looks funny to me. The general idea seems
tobe that enable_data_checksums() just starts the launcher process, and the launcher process figures out what it need
todo and makes all the changes to the global state. But then there's this violation of the idea:
enable_data_checksums()checks DataChecksumsOnInProgress(), and tells the launcher process whether it should continue a
previouslycrashed operation or start from scratch. I think it would be much cleaner if the launcher process figured
thatout itself, and enable_data_checksums() would just tell the launcher what the target state is.
 
>>
>> enable_data_checksums() and disable_data_checksums() seem prone to race conditions. If you call
enable_data_checksums()in two backends concurrently, depending on the timing, there are two possible outcomes:
 
>>
>> a) One call returns true, and launches the background process. The other call returns false.
>>
>> b) Both calls return true, but one of them emits a "NOTICE: data checksums worker is already running".
>>
>> In disable_data_checksum() imagine what happens if another backend calls enable_data_checksums() in between the
ShutdownDatachecksumsWorkerIfRunning()and SetDataChecksumsOff() calls.
 
> 
> I've reworked this in the attached such that the enable_ and disable_ functions
> merely call into the launcher with the desired outcome, and the launcher is
> responsible for figuring out the rest.  The datachecksumworker is now the sole
> place which initiates a state transfer.

Well, you still fill the DatachecksumsWorkerShmem->operations array in 
the backend process that launches the datacheckumworker, not in the 
worker process. I find that still a bit surprising, but I believe it works.

>>>         /*
>>>          * Mark the buffer as dirty and force a full page write.  We have to
>>>          * re-write the page to WAL even if the checksum hasn't changed,
>>>          * because if there is a replica it might have a slightly different
>>>          * version of the page with an invalid checksum, caused by unlogged
>>>          * changes (e.g. hintbits) on the master happening while checksums
>>>          * were off. This can happen if there was a valid checksum on the page
>>>          * at one point in the past, so only when checksums are first on, then
>>>          * off, and then turned on again. Iff wal_level is set to "minimal",
>>>          * this could be avoided iff the checksum is calculated to be correct.
>>>          */
>>>         START_CRIT_SECTION();
>>>         MarkBufferDirty(buf);
>>>         log_newpage_buffer(buf, false);
>>>         END_CRIT_SECTION();
>>
>> It's really unfortunate that we have to dirty the page even if the checksum already happens to match. Could we only
dothe log_newpage_buffer() call and skip MarkBufferDirty() in that case?
 
> 
> I think we can, but I've intentionally stayed away from such optimizations
> until the basic version of the patch was deemed safe and approaching done.
> It's complicated enough as it is IMO.

Fair enough.

> The attached fixes the above plus a few other small things found while hacking
> on this version.

I haven't read through the whole patch, but a few random comments below, 
in no particular order:

pg_enable/disable_data_checksums() should perform a superuser-check. I 
don't think we want to expose them to any users.

> +/*
> + * Local state for Controlfile data_checksum_version. After initialization,
> + * this is only updated when absorbing a procsignal barrier during interrupt
> + * processing. Thus, it can be read by backends without the need for a lock.
> + * Possible values are the checksum versions defined in storage/bufpage.h and
> + * zero for when checksums are disabled.
> + */
> +static uint32 LocalDataChecksumVersion = 0;

The comment is a bit confusing: "Thus, it can be read by backends 
without the need for a lock". Since it's a variable in backend-private 
memory, it can only be read by the same backend, not "backends". And 
that's also why you don't need a lock, not because it's updated during 
interrupt processing. I understand how this works, but maybe rephrase 
the comment a bit.

> +/*
> + * DataChecksumsOnInProgress
> + *             Returns whether data checksums are being enabled
> + *
> + * Most callsites shouldn't need to worry about the "inprogress" states, since
> + * they should check the requirement for verification or writing. Some low-
> + * level callsites dealing with page writes need however to know. It's also
> + * used to check for aborted checksum processing which need to be restarted.
>   */
>  bool
> -DataChecksumsEnabled(void)
> +DataChecksumsOnInProgress(void)
> +{
> +       return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
> +}

s/need/needs/. The whole paragraph feels a bit awkwardly worded in 
general. I'd suggest something like: "Most operations don't need to 
worry about the "inprogress" states, and should use 
DataChecksumsNeedVerify() or DataChecksumsNeedWrite()". 
DataChecksumsOffInProgress() is called from 
StartDatachecksumsWorkerLauncher(), which I wouldn't call a "low-level 
callsite".

> @@ -1079,7 +1091,7 @@ XLogInsertRecord(XLogRecData *rdata,
>          Assert(RedoRecPtr < Insert->RedoRecPtr);
>          RedoRecPtr = Insert->RedoRecPtr;
>      }
> -    doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
> +    doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
>  
>      if (doPageWrites &&
>          (!prevDoPageWrites ||

The comment above this deserves to be updated. Also, this is a very hot 
codepath; will this slow down WAL-logging, when full-page writes are 
disabled? Could we inline DataChecksumsOnInProgress() or set 
Insert->forcePageWrites when checksums are being computed or something?

In StartupXLOG():
> +    /*
> +     * If data checksums were being disabled when the cluster was shutdown, we
> +     * know that we have a state where all backends have stopped validating
> +     * checksums and we can move to off instead.
> +     */
> +    if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
> +    {
> +        LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> +        ControlFile->data_checksum_version = 0;
> +        LWLockRelease(ControlFileLock);
> +    }
> +

Should this be WAL-logged, like in SetDataChecksumsOff()?

In SetDataChecksumsOff():
> +    ControlFile->data_checksum_version = 0;
> +    UpdateControlFile();
> +    LWLockRelease(ControlFileLock);
> +
> +    XlogChecksums(0);
> +    WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
> +}

What happens is if you crash between UpdateControlFile() and XlogChecksum()?

- Heikki



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bogus documentation for bogus geometric operators
Next
From: John Naylor
Date:
Subject: Re: truncating timestamps on arbitrary intervals