Re: Enable data checksums by default - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Enable data checksums by default |
Date | |
Msg-id | 20190330192543.GH4719@development Whole thread Raw |
In response to | Re: Enable data checksums by default (Christoph Berg <myon@debian.org>) |
Responses |
Re: Enable data checksums by default
Re: Enable data checksums by default |
List | pgsql-hackers |
On Fri, Mar 29, 2019 at 08:35:26PM +0100, Christoph Berg wrote: >Re: Bernd Helmle 2019-03-29 <3586bb9345a59bfc8d13a50a7c729be1ee6759fd.camel@oopsware.de> >> Am Freitag, den 29.03.2019, 23:10 +0900 schrieb Michael Paquier: >> > >> > I can't really believe that many people set up shared_buffers at >> > 128kB >> > which would cause such a large number of page evictions, but I can >> > believe that many users have shared_buffers set to its default value >> > and that we are going to get complains about "performance drop after >> > upgrade to v12" if we switch data checksums to on by default. >> >> Yeah, i think Christoph's benchmark is based on this thinking. I assume >> this very unrealistic scenery should emulate the worst case (many >> buffer_reads, high checksum calculation load). > >It's not unrealistic to have large seqscans that are all buffer >misses, the table just has to be big enough. The idea in my benchmark >was that if I make shared buffers really small, and the table still >fits in to RAM, I should be seeing only buffer misses, but without any >delay for actually reading from disk. > >Christoph > FWIW I think it's a mistake to focus solely on CPU utilization, which all the benchmarks performed on this thread do because they look at tps of in-memory read-only workloads. Checksums have other costs too, not just the additional CPU time. Most importanly they require wal_log_hints to be set (which people may or may not want anyway). I've done a simple benchmark, that does read-only (-S) and read-write (-N) pgbench runs with different scales, but also measures duration of the pgbench init and amount of WAL produced during the tests. On a small machine (i5, 8GB RAM, SSD RAID) the results are these: scale config | init tps wal =========================|================================== ro 10 no-hints | 2 117038 130 hints | 2 116378 146 checksums | 2 115619 147 -------------------|---------------------------------- 200 no-hints | 32 88340 2407 hints | 37 86154 2628 checksums | 36 83336 2624 -------------------|---------------------------------- 2000 no-hints | 365 38680 1967 hints | 423 38670 2123 checksums | 504 37510 2046 -------------------------|---------------------------------- rw 10 no-hints | 2 19691 437 hints | 2 19712 437 checksums | 2 19654 437 -------------------|---------------------------------- 200 no-hints | 32 15839 2745 hints | 37 15735 2783 checksums | 36 15646 2775 -------------------|---------------------------------- 2000 no-hints | 365 5371 3721 hints | 423 5270 3671 checksums | 504 5094 3574 The no-hints config is default (wal_log_hints=off, data_checksums=off), hints sets wal_log_hints=on and checksums enables data checksums. All the configs were somewhat tuned (1GB shared buffers, max_wal_size high enough not to hit checkpoints very often, etc.). I've also done the tests on the a larger machine (2x E5-2620v4, 32GB of RAM, NVMe SSD), and the general pattern is about the same - while the tps and amount of WAL (not covering the init) does not change, the time for initialization increases significantly (by 20-40%). This effect is even clearer when using slower storage (SATA-based RAID). The results then look like this: scale config | init tps wal =========================|================================== ro 100 no-hints | 49 229459 122 hints | 101 167983 190 checksums | 103 156307 190 -------------------|---------------------------------- 1000 no-hints | 580 152167 109 hints | 1047 122814 142 checksums | 1080 118586 141 -------------------|---------------------------------- 6000 no-hints | 4035 508 1 hints | 11193 502 1 checksums | 11376 506 1 -------------------------|---------------------------------- rw 100 no-hints | 49 279 192 hints | 101 275 190 checksums | 103 275 190 -------------------|---------------------------------- 1000 no-hints | 580 237 210 hints | 1047 225 201 checksums | 1080 224 200 -------------------|---------------------------------- 6000 no-hints | 4035 135 123 hints | 11193 133 122 checksums | 11376 132 121 and when expressed as relative to no-hints: scale config | init tps wal ============================|=============================== ro 100 hints | 206% 73% 155% checksums | 210% 68% 155% -------------------|-------------------------------- 1000 hints | 181% 81% 131% checksums | 186% 78% 129% -------------------|-------------------------------- 6000 hints | 277% 99% 100% checksums | 282% 100% 104% ----------------------------|-------------------------------- rw 100 hints | 206% 99% 99% checksums | 210% 99% 99% -------------------|-------------------------------- 1000 hints | 181% 95% 96% checksums | 186% 95% 95% -------------------|-------------------------------- 6000 hints | 277% 99% 99% checksums | 282% 98% 98% I have not investigated the exact reasons, but my hypothesis it's about the amount of WAL generated during the initial CREATE INDEX (because it probably ends up setting the hint bits), which puts additional pressure on the storage. Unfortunately, this additional cost is unlikely to go away :-( Now, maybe we want to enable checksums by default anyway, but we should not pretent the only cost related to checksums is CPU usage. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: