Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help) - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Date
Msg-id 20210106170240.GG27507@tamriel.snowman.net
Whole thread Raw
In response to Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)  (Andres Freund <andres@anarazel.de>)
Responses Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)  (Bruce Momjian <bruce@momjian.us>)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2021-01-04 19:11:43 +0100, Michael Banck wrote:
> > Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost:
> > > * Michael Paquier (michael@paquier.xyz) wrote:
> > > > On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote:
> > > > > I think enough people use data checksums these days that it warrants to
> > > > > be moved into the "normal part", like in the attached.
> > > >
> > > > +1.  Let's see first what others think about this change.
> > >
> > > I agree with this, but I'd also like to propose, again, as has been
> > > discussed a few times, making it the default too.
>
> FWIW, I am quite doubtful we're there performance-wise. Besides the WAL
> logging overhead, the copy we do via PageSetChecksumCopy() shows up
> quite significantly in profiles here. Together with the checksums
> computation that's *halfing* write throughput on fast drives in my aio
> branch.

Our defaults are not going to win any performance trophies and so I
don't see the value in stressing over it here.

> > This looks much better from the WAL size perspective, there's now almost
> > no additional WAL. However, that is because pgbench doesn't do TOAST, so
> > in a real-world example it might still be quite larger. Also, the vacuum
> > runtime is still 15x longer.
>
> That's obviously an issue.

It'd certainly be nice to figure out a way to improve the VACUUM run but
I don't think the impact on the time to run VACUUM is really a good
reason to not move forward with changing the default.

> > So maybe we should switch on wal_compression if we enable data checksums
> > by default.

That does seem like a good idea to me, +1 to also changing that.

> It unfortunately also hurts other workloads. If we moved towards a saner
> compression algorithm that'd perhaps not be an issue anymore...

I agree that improving compression performance would be good but I don't
see that as relevant to the question of what our defaults should be.

imv, enabling page checksums is akin to having fsync enabled by default.
Does it impact performance?  Yes, surely quite a lot, but it's also the
safe and sane choice when it comes to defaults.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: PoC/WIP: Extended statistics on expressions
Next
From: Bruce Momjian
Date:
Subject: Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)