Re: Online checksums patch - once again - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Online checksums patch - once again
Date
Msg-id c29d7e22-c40b-8a5b-72ed-d7ea87741403@iki.fi
Whole thread Raw
In response to Re: Online checksums patch - once again  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: Online checksums patch - once again  (Daniel Gustafsson <daniel@yesql.se>)
List pgsql-hackers
Revisiting an issue we discussed earlier:

On 25/11/2020 15:20, Daniel Gustafsson wrote:
> On 23 Nov 2020, at 18:36, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> On 17/11/2020 10:56, Daniel Gustafsson wrote:
>>> I've reworked this in the attached such that the enable_ and
>>> disable_ functions merely call into the launcher with the desired
>>> outcome, and the launcher is responsible for figuring out the
>>> rest.  The datachecksumworker is now the sole place which
>>> initiates a state transfer.
>> 
>> Well, you still fill the DatachecksumsWorkerShmem->operations array
>> in the backend process that launches the datacheckumworker, not in
>> the worker process. I find that still a bit surprising, but I
>> believe it works.
> 
> I'm open to changing it in case there are strong opinions, it just
> seemed the most natural to me.

This kept bothering me, so I spent a while hacking this to my liking. 
The attached patch moves the code to fill in 'operations' from the 
backend to the launcher, so that the pg_enable/disable_checksums() call 
now truly just stores the desired state, checksum on or off, in shared 
memory, and launches the launcher process. The launcher process figures 
out how to get to the desired state. This removes the couple of corner 
cases that previously emitted a NOTICE about the processing being 
concurrently disabled or aborted. What do you think? I haven't done much 
testing, so if you adopt this approach, please check if I broke 
something in the process.

This changes the way the abort works. If the 
pg_enable/disable_checksums() is called, while the launcher is already 
busy changing the state, pg_enable/disable_checksums() will just set the 
new desired state in shared memory anyway. The launcher process will 
notice that the target state changed some time later, and restart from 
scratch.

A couple of other issues came up while doing that:

- AbortProcessing() has two callers, one in code that runs in the 
launcher process, and another one in code that runs in the worker 
process. Is it really safe to use from the worker process? Calling 
ProcessAllDatabases() in the worker seems sketchy. (This is moot in this 
new patch version, as I removed AbortProcessing() altogether)

- Is it possible that the worker keeps running after the launcher has 
already exited, e.g. because of an ERROR or SIGTERM? If you then quickly 
call pg_enable_checksums() again, can you end up with two workers 
running at the same time? Is that bad?

On 26/01/2021 23:00, Daniel Gustafsson wrote:
>> On 22 Jan 2021, at 12:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> @@ -3567,6 +3571,27 @@ RelationBuildLocalRelation(const char *relname,
>>>                 relkind == RELKIND_MATVIEW)
>>>                 RelationInitTableAccessMethod(rel);
>>> +       /*
>>> +        * Set the data checksum state. Since the data checksum state can change at
>>> +        * any time, the fetched value might be out of date by the time the
>>> +        * relation is built.  DataChecksumsNeedWrite returns true when data
>>> +        * checksums are: enabled; are in the process of being enabled (state:
>>> +        * "inprogress-on"); are in the process of being disabled (state:
>>> +        * "inprogress-off"). Since relhaschecksums is only used to track progress
>>> +        * when data checksums are being enabled, and going from disabled to
>>> +        * enabled will clear relhaschecksums before starting, it is safe to use
>>> +        * this value for a concurrent state transition to off.
>>> +        *
>>> +        * If DataChecksumsNeedWrite returns false, and is concurrently changed to
>>> +        * true then that implies that checksums are being enabled. Worst case,
>>> +        * this will lead to the relation being processed for checksums even though
>>> +        * each page written will have them already.  Performing this last shortens
>>> +        * the window, but doesn't avoid it.
>>> +        */
>>> +       HOLD_INTERRUPTS();
>>> +       rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
>>> +       RESUME_INTERRUPTS();
>>> +
>>>         /*
>>>          * Okay to insert into the relcache hash table.
>>>          *
>>
>> I grepped for relhashcheckums, and concluded that the value in the
>> relcache isn't actually used for anything. Not so! In
>> heap_create_with_catalog(), the actual pg_class row is constructed
>> from the relcache entry, so the value set in
>> RelationBuildLocalRelation() finds its way to pg_class. Perhaps it
>> would be more clear to pass relhachecksums directly as an argument
>> to AddNewRelationTuple(). That way, the value in the relcache would
>> be truly never used.
> 
> I might be thick (or undercaffeinated) but I'm not sure I follow. 
> AddNewRelationTuple calls InsertPgClassTuple which in turn avoids the
> relcache entry.

Ah, you're right, I misread AddNewRelationTuple. That means that the 
relhaschecksums field in the relcache is never used? That's a clear 
rule. The changes to formrdesc() and RelationBuildLocalRelation() seem 
unnecessary then, we can always initialize relhaschecksums to false in 
the relcache.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Amit Khandekar
Date:
Subject: Re: Speeding up GIST index creation for tsvectors
Next
From: Tom Lane
Date:
Subject: Inconsistent function definitions in ECPG's informix.c