Re: Offline enabling/disabling of data checksums - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: Offline enabling/disabling of data checksums |
Date | |
Msg-id | alpine.DEB.2.21.1903210745310.3843@lancre Whole thread Raw |
In response to | Re: Offline enabling/disabling of data checksums (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Offline enabling/disabling of data checksums
|
List | pgsql-hackers |
Bonjour Michaël, > On Wed, Mar 20, 2019 at 05:46:32PM +0100, Fabien COELHO wrote: >> I think that the motivation/risks should appear before the solution. "As xyz >> ..., ...", or there at least the logical link should be outlined. >> >> It is not clear for me whether the following sentences, which seems specific >> to "pg_rewind", are linked to the previous advice, which seems rather to >> refer to streaming replication? > > Do you have a better idea of formulation? I can try, but I must admit that I'm fuzzy about the actual issue. Is there a problem on a streaming replication with inconsistent checksum settings, or not? You seem to suggest that the issue is more about how some commands or backup tools operate on a cluster. I'll reread the thread carefully and will make a proposal. > Imagine for example a primary-standby with checksums disabled: [...] Yep, that's cool. >> Should not disabling in reverse order be safe? the checksum are not checked >> afterwards? > > I don't quite understand your comment about the ordering. If all the > standbys are destroyed first, then enabling/disabling checksums happens > at a single place. Sure. I was suggesting that disabling on replicated clusters is possibly safer, but do not know the detail of replication & checksumming with enough precision to be that sure about it. >> After the reboot, some data files are not fully updated with their >> checksums, although the controlfiles tells that they are. It should then >> fail after a restart when a no-checksum page is loaded? >> >> What am I missing? > > Please note that we do that in other tools as well and we live fine > with that as pg_basebackup, pg_rewind just to name two. The fact that other commands are exposed to the same potential risk is not a very good argument not to fix it. > I am not saying that it is not a problem in some cases, but I am saying > that this is not a problem that this patch should solve. As solving the issue involves exchanging two lines and turning one boolean parameter to true, I do not see why it should not be done. Fixing the issue takes much less time than writing about it... And if other commands can be improved fine with me. > If we were to do something about that, it could make sense to make > fsync_pgdata() smarter so as the control file is flushed last there, or > define flush strategies there. ISTM that this would not work: The control file update can only be done *after* the fsync to describe the cluster actual status, otherwise it is just a question of luck whether the cluster is corrupt on an crash while fsyncing. The enforced order of operation, with a barrier in between, is the important thing here. -- Fabien.
pgsql-hackers by date: