Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help) - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Date
Msg-id 20210108000816.GU27507@tamriel.snowman.net
Whole thread Raw
In response to Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)  (Peter Geoghegan <pg@bowt.ie>)
Responses RE: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
Greetings,

* Peter Geoghegan (pg@bowt.ie) wrote:
> On Thu, Jan 7, 2021 at 1:14 PM Stephen Frost <sfrost@snowman.net> wrote:
> > Much of this line of discussion seems to be, incorrectly, focused on my
> > mere mention of viewing the use of fsync and checksums as mechanism for
> > addressing certain risks, but that doesn't seem to be a terribly
> > fruitful direction to be going in.  I'm not suggesting that we should go
> > turn off fsync by default simply because we don't have checksums on by
> > default, which seems to be the implication.
>
> I admit that I saw red. This was a direct result of your bogus
> argument, which greatly overstated the case in favor of enabling
> checksums by default. I regret my role in that now, though. It would
> be good to debate the actual issue, but that isn't what I saw.
> Everyone knows the principles behind checksums and how they're useful
> -- it doesn't need to be a part of the discussion.

I hadn't intended to make an argument that enabling checksums was
equivilant to enabling or disabling fsync- I said it was 'akin', by
which I meant it was similar in character, as in, as I said previously,
a way for PG to hedge against certain external-to-PG risks (though,
unfortunately, our checksums aren't able to actually mitigate any of the
risks but merely to detect them, but there is certainly value in that
too).

I also now regret not being clearer as to what I meant with that comment.

> I think that it should be possible to make a much better case in favor
> of enabling checksums by default. On further reflection I actually
> don't think that the real-world VACUUM overhead is anything like 15x,
> though the details are complex. I might be willing to help with this
> analysis, but since you only seem to want to discuss the question in a
> narrow way (e.g. "I agree that improving compression performance would
> be good but I don't see that as relevant to the question of what our
> defaults should be"), I have to wonder if it's worth the trouble.

What I was attempting to get at with that comment is that while I don't
feel it's relevant, I wouldn't object to both being enabled by default
and if those changes combined helps to get others on board with having
checksums enabled by default then such an approach would also get my
vote.  I also doubt that VACUUM performance would be impacted as heavily
in real-world workloads, but I again point out that VACUUMs, in our
default configuration, is going to be run with the breaks on since it's
run by autovacuum with a non-zero vacuum cost delay.  While I've
advocated for having that cost delay reduced (or the cost limit
increased) in the past, I wouldn't support eliminating the delays
entirely as that would then impact foreground activity, which is
certainly where performance is more important.

I appreciate that VACUUM run by an administrator directly doesn't have
the breaks on, but that then is much more likely to impact foreground
activity and is generally discouraged because of that- instead it's
generally recommended to configure autovacuum to be more aggressive
while still having a delay.  Once you're past the point where you want
delays to be introduced during VACUUM runs, I'd certainly think it's
gone past the point where our standard defaults would be appropriate in
a number of ways and a user could then consider if they want to disable
checksums and accept the risk associated with doing so in favor of
making VACUUM go faster, or not.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: Implement for window functions
Next
From: Tomas Vondra
Date:
Subject: Re: list of extended statistics on psql