Home > mailing lists

Re: Offline enabling/disabling of data checksums - Mailing list pgsql-hackers

From	Fabien COELHO
Subject	Re: Offline enabling/disabling of data checksums
Date	March 13, 2019 10:55:24
Msg-id	alpine.DEB.2.21.1903131144030.4059@lancre Whole thread Raw
In response to	Re: Offline enabling/disabling of data checksums (Michael Banck <michael.banck@credativ.de>)
List	pgsql-hackers

Tree view

Hallo Michael,

> I propose we re-read the control file for the enable case after we
> finished operating on all files and (i) check the instance is still
> offline and (ii) update the checksums version from there. That should be
> a small but worthwhile change that could be done anyway.

That looks like a simple but mostly effective guard.

> Another option would be to add a new feature which reliably blocks an
> instance from starting up due to maintenance - either a new control file
> field, some message in postmaster.pid (like "pg_checksums maintenance in
> progress") that would prevent pg_ctl or postgres/postmaster from
> starting up like 'FATAL:  bogus data in lock file "postmaster.pid":
> "pg_checksums in progress' or some other trigger file.

I think that a clear cluster-locking can-be-overriden-if-needed 
shared-between-commands mechanism would be a good thing (tm), although it 
requires some work.

My initial suggestion was to update the control file with an appropriate 
state, eg some general "admin command in progress", but I understood that 
it is rejected, and for another of your patch it seems that the 
"postmaster.pid" file is the right approach. Fine with me, the point is 
that it should be effective and consistent accross all relevant commands.

A good point about the "postmaster.pid" trick, when it does not contain 
the posmaster pid, is that overriding is as simple as "rm postmaster.pid".

>> It could be possible to reach a state where the control file has 
>> checksums enabled and some blocks are not correctly synced, still you 
>> would notice rather quickly if the server is in an incorrect state at 
>> the follow-up startup.
>
> Would you?  I think I'm with Fabien on this one and it seems worthwhile
> to run fsync_pgdata() before and after update_controlfile() - the second
> one should be really quick anyway. 

Note that fsync_pgdata is kind of heavy, it recurses everywhere. I think 
that a simple fsync on the control file only is enough.

> Also, I suggest to maybe add a notice in verbose mode that we are
> syncing the data directory - otherwise the user might wonder what's
> going on at 100% done, though I haven't seen a large delay in my tests
> so far.

I agree, as it might not be cheap.

-- 
Fabien.

pgsql-hackers by date:

From: Sergei Kornilov
Date: 13 March 2019, 10:54:31
Subject: Re: Offline enabling/disabling of data checksums

From: Alexander Korotkov
Date: 13 March 2019, 11:09:19
Subject: Re: WIP: BRIN multi-range indexes

Re: Offline enabling/disabling of data checksums - Mailing list pgsql-hackers

Previous

Next