Thread: Move --data-checksums to common options in initdb --help
Hi, I noticed -k/--data-checksums is currently in the less commonly used options part of the initdb --help output: |Less commonly used options: | -d, --debug generate lots of debugging output | -k, --data-checksums use data page checksums I think enough people use data checksums these days that it warrants to be moved into the "normal part", like in the attached. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote: > I think enough people use data checksums these days that it warrants to > be moved into the "normal part", like in the attached. +1. Let's see first what others think about this change. -- Michael
Attachment
Greetings, * Michael Paquier (michael@paquier.xyz) wrote: > On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote: > > I think enough people use data checksums these days that it warrants to > > be moved into the "normal part", like in the attached. > > +1. Let's see first what others think about this change. I agree with this, but I'd also like to propose, again, as has been discussed a few times, making it the default too. Thanks, Stephen
Attachment
data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Michael Banck
Date:
Heya, (changing the subject as we're moving the goalposts) Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost: > * Michael Paquier (michael@paquier.xyz) wrote: > > On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote: > > > I think enough people use data checksums these days that it warrants to > > > be moved into the "normal part", like in the attached. > > > > +1. Let's see first what others think about this change. > > I agree with this, but I'd also like to propose, again, as has been > discussed a few times, making it the default too. One thing my colleagues have complained is the seemingly excessive amount of WAL generation when checksums are enabled (compared to the default where data_checksums and wal_log_hints is both off) due to additional FPIs. So I made some quick benchmarks based on pgbench -i (i.e. just initializing the data, not actually running queries) and seeing how much WAL is produced during a VACUUM with a forced CHECKPOINT beforehand. This creates a new instance, turns archiving on and then first does the data-load with scale-factor 100 in pgbench (initialization steps "dtg"), followed by a CHECKPOINT and then the VACUUM/PK generation steps (initialization steps "vp"), followed by a final CHECKPOINT. It looks like this where $CHECKSUM is either empty or '-k': pg_ctl -D data1 stop; rm -rf data1/ data1_archive/*; initdb $CHECKSUM -D data1; cp postgresql.conf data1; pg_ctl -D data1 -l data1_logfile start; pgbench -s 100 -i -p 65432 -I dtg; echo CHECKPOINT | psql -p 65432; pgbench -s 100 -i -p 65432 -I vp; echo CHECKPOINT | psql -p 65432; du -s -h data1/pg_wal data1/base data1_archive/ All runs were repeated twice. These are the $PGDATA/{pg_wal,base} sizes and the archive, as well as the timing for the second pgbench initialization step: data_checksums=off, wal_compression=off 1,1G data1/pg_wal 1,5G data1/base 1,3G data1_archive/ done in 10.24 s (vacuum 3.31 s, primary keys 6.92 s). done in 8.81 s (vacuum 2.72 s, primary keys 6.09 s). done in 8.35 s (vacuum 2.32 s, primary keys 6.03 s). data_checksums=on, wal_compression=off 1,5G data1/pg_wal 1,5G data1/base 2,5G data1_archive/ done in 67.42 s (vacuum 54.57 s, primary keys 12.85 s). done in 65.03 s (vacuum 53.25 s, primary keys 11.78 s). done in 77.57 s (vacuum 62.64 s, primary keys 14.94 s). So data_checksums (and/or wal_log_hints, I ommitted those numbers as they are basically identical to the data_checksums=on case) makes (i) Vacuum run 20x and primary keys 2x longer and also increases the generated WAL by 40% for pg_wal and roughly doubles the WAL in the archive. I then re-ran the tests with wal_compression=on in order to see how much that helps: data_checksums=off, wal_compression=on 1,1G data1/pg_wal 1,5G data1/base 1,2G data1_archive/ done in 26.60 s (vacuum 3.30 s, primary keys 23.30 s). done in 19.54 s (vacuum 3.11 s, primary keys 16.43 s). done in 19.50 s (vacuum 3.46 s, primary keys 16.04 s). data_checksums=on, wal_compression=on 1,1G data1/pg_wal 1,5G data1/base 1,3G data1_archive/ done in 60.24 s (vacuum 42.52 s, primary keys 17.72 s). done in 62.07 s (vacuum 45.64 s, primary keys 16.43 s). done in 56.20 s (vacuum 40.96 s, primary keys 15.24 s). This looks much better from the WAL size perspective, there's now almost no additional WAL. However, that is because pgbench doesn't do TOAST, so in a real-world example it might still be quite larger. Also, the vacuum runtime is still 15x longer. So maybe we should switch on wal_compression if we enable data checksums by default. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Michael Paquier
Date:
On Mon, Jan 04, 2021 at 07:11:43PM +0100, Michael Banck wrote: > Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost: >> * Michael Paquier (michael@paquier.xyz) wrote: >>> On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote: >>> > I think enough people use data checksums these days that it warrants to >>> > be moved into the "normal part", like in the attached. >>> >>> +1. Let's see first what others think about this change. >> >> I agree with this. Okay, so I have applied this part as it makes sense independently. >> But I'd also like to propose, again, as has been >> discussed a few times, making it the default too. While I don't particularly disagree, I think that this needs careful evaluation. > So maybe we should switch on wal_compression if we enable data checksums > by default. I don't agree with this assumption. In some CPU-bounded workloads, I have seen that wal_compression = on leads to performance degradation with or without checksums enabled. -- Michael
Attachment
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Andres Freund
Date:
Hi, On 2021-01-04 19:11:43 +0100, Michael Banck wrote: > Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost: > > * Michael Paquier (michael@paquier.xyz) wrote: > > > On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote: > > > > I think enough people use data checksums these days that it warrants to > > > > be moved into the "normal part", like in the attached. > > > > > > +1. Let's see first what others think about this change. > > > > I agree with this, but I'd also like to propose, again, as has been > > discussed a few times, making it the default too. FWIW, I am quite doubtful we're there performance-wise. Besides the WAL logging overhead, the copy we do via PageSetChecksumCopy() shows up quite significantly in profiles here. Together with the checksums computation that's *halfing* write throughput on fast drives in my aio branch. > This looks much better from the WAL size perspective, there's now almost > no additional WAL. However, that is because pgbench doesn't do TOAST, so > in a real-world example it might still be quite larger. Also, the vacuum > runtime is still 15x longer. That's obviously an issue. > So maybe we should switch on wal_compression if we enable data checksums > by default. It unfortunately also hurts other workloads. If we moved towards a saner compression algorithm that'd perhaps not be an issue anymore... Greetings, Andres Freund
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Michael Banck
Date:
Hi, Am Mittwoch, den 06.01.2021, 10:52 +0900 schrieb Michael Paquier: > On Mon, Jan 04, 2021 at 07:11:43PM +0100, Michael Banck wrote: > > So maybe we should switch on wal_compression if we enable data checksums > > by default. > > I don't agree with this assumption. In some CPU-bounded workloads, I > have seen that wal_compression = on leads to performance degradation > with or without checksums enabled. I meant just flipping the default, admins could still turn off wal_compression if they think it'd help their performance. But it might be tricky to implement, not sure. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Stephen Frost
Date:
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2021-01-04 19:11:43 +0100, Michael Banck wrote: > > Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost: > > > * Michael Paquier (michael@paquier.xyz) wrote: > > > > On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote: > > > > > I think enough people use data checksums these days that it warrants to > > > > > be moved into the "normal part", like in the attached. > > > > > > > > +1. Let's see first what others think about this change. > > > > > > I agree with this, but I'd also like to propose, again, as has been > > > discussed a few times, making it the default too. > > FWIW, I am quite doubtful we're there performance-wise. Besides the WAL > logging overhead, the copy we do via PageSetChecksumCopy() shows up > quite significantly in profiles here. Together with the checksums > computation that's *halfing* write throughput on fast drives in my aio > branch. Our defaults are not going to win any performance trophies and so I don't see the value in stressing over it here. > > This looks much better from the WAL size perspective, there's now almost > > no additional WAL. However, that is because pgbench doesn't do TOAST, so > > in a real-world example it might still be quite larger. Also, the vacuum > > runtime is still 15x longer. > > That's obviously an issue. It'd certainly be nice to figure out a way to improve the VACUUM run but I don't think the impact on the time to run VACUUM is really a good reason to not move forward with changing the default. > > So maybe we should switch on wal_compression if we enable data checksums > > by default. That does seem like a good idea to me, +1 to also changing that. > It unfortunately also hurts other workloads. If we moved towards a saner > compression algorithm that'd perhaps not be an issue anymore... I agree that improving compression performance would be good but I don't see that as relevant to the question of what our defaults should be. imv, enabling page checksums is akin to having fsync enabled by default. Does it impact performance? Yes, surely quite a lot, but it's also the safe and sane choice when it comes to defaults. Thanks, Stephen
Attachment
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Bruce Momjian
Date:
On Wed, Jan 6, 2021 at 12:02:40PM -0500, Stephen Frost wrote: > > It unfortunately also hurts other workloads. If we moved towards a saner > > compression algorithm that'd perhaps not be an issue anymore... > > I agree that improving compression performance would be good but I don't > see that as relevant to the question of what our defaults should be. > > imv, enabling page checksums is akin to having fsync enabled by default. > Does it impact performance? Yes, surely quite a lot, but it's also the > safe and sane choice when it comes to defaults. Well, you know fsyncs are required to recover from an OS crash, which is more likely than detecting data corruption. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Stephen Frost
Date:
Greetings, * Bruce Momjian (bruce@momjian.us) wrote: > On Wed, Jan 6, 2021 at 12:02:40PM -0500, Stephen Frost wrote: > > > It unfortunately also hurts other workloads. If we moved towards a saner > > > compression algorithm that'd perhaps not be an issue anymore... > > > > I agree that improving compression performance would be good but I don't > > see that as relevant to the question of what our defaults should be. > > > > imv, enabling page checksums is akin to having fsync enabled by default. > > Does it impact performance? Yes, surely quite a lot, but it's also the > > safe and sane choice when it comes to defaults. > > Well, you know fsyncs are required to recover from an OS crash, which is > more likely than detecting data corruption. Yes, I do know that. That doesn't change my feeling that we should have checksums enabled by default. Thanks, Stephen
Attachment
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Magnus Hagander
Date:
On Wed, Jan 6, 2021 at 8:31 AM Michael Banck <michael.banck@credativ.de> wrote: > > Hi, > > Am Mittwoch, den 06.01.2021, 10:52 +0900 schrieb Michael Paquier: > > On Mon, Jan 04, 2021 at 07:11:43PM +0100, Michael Banck wrote: > > > So maybe we should switch on wal_compression if we enable data checksums > > > by default. > > > > I don't agree with this assumption. In some CPU-bounded workloads, I > > have seen that wal_compression = on leads to performance degradation > > with or without checksums enabled. > > I meant just flipping the default, admins could still turn off > wal_compression if they think it'd help their performance. But it might > be tricky to implement, not sure. The other argument is that admins can cheaply and quickly turn off checksums if they don't want them. The same cannot be said for turning them *on* again, that's a very slow offline operation at this time. Turning off checksums doesn't take noticeably more time than say changing the shared_buffers from the default, which is also almost guaranteed to be wrong for most installations. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Andres Freund
Date:
Hi, On 2021-01-06 12:02:40 -0500, Stephen Frost wrote: > * Andres Freund (andres@anarazel.de) wrote: > > On 2021-01-04 19:11:43 +0100, Michael Banck wrote: > > > Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost: > > > > I agree with this, but I'd also like to propose, again, as has been > > > > discussed a few times, making it the default too. > > > > FWIW, I am quite doubtful we're there performance-wise. Besides the WAL > > logging overhead, the copy we do via PageSetChecksumCopy() shows up > > quite significantly in profiles here. Together with the checksums > > computation that's *halfing* write throughput on fast drives in my aio > > branch. > > Our defaults are not going to win any performance trophies and so I > don't see the value in stressing over it here. Meh^3. There's a difference between defaults that are about resource usage (e.g. shared_buffers) and defaults that aren't. > > > This looks much better from the WAL size perspective, there's now almost > > > no additional WAL. However, that is because pgbench doesn't do TOAST, so > > > in a real-world example it might still be quite larger. Also, the vacuum > > > runtime is still 15x longer. > > > > That's obviously an issue. > > It'd certainly be nice to figure out a way to improve the VACUUM run but > I don't think the impact on the time to run VACUUM is really a good > reason to not move forward with changing the default. Vacuum performance is one of *THE* major complaints about postgres. Making it run slower by a lot obviously exascerbates that problem significantly. I think it'd be prohibitively expensive if it were 1.5x, not to even speak of 15x. > imv, enabling page checksums is akin to having fsync enabled by default. > Does it impact performance? Yes, surely quite a lot, but it's also the > safe and sane choice when it comes to defaults. Oh for crying out loud. Greetings, Andres Freund
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Andres Freund
Date:
Hi, On 2021-01-06 18:27:48 +0100, Magnus Hagander wrote: > The other argument is that admins can cheaply and quickly turn off > checksums if they don't want them. > > The same cannot be said for turning them *on* again, that's a very > slow offline operation at this time. > > Turning off checksums doesn't take noticeably more time than say > changing the shared_buffers from the default, which is also almost > guaranteed to be wrong for most installations. It still requires running a binary locally on the DB server, no? Which means it'll not be an option on most cloud providers... Greetings, Andres Freund
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Stephen Frost
Date:
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2021-01-06 12:02:40 -0500, Stephen Frost wrote: > > * Andres Freund (andres@anarazel.de) wrote: > > > On 2021-01-04 19:11:43 +0100, Michael Banck wrote: > > > > Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost: > > > > > I agree with this, but I'd also like to propose, again, as has been > > > > > discussed a few times, making it the default too. > > > > > > FWIW, I am quite doubtful we're there performance-wise. Besides the WAL > > > logging overhead, the copy we do via PageSetChecksumCopy() shows up > > > quite significantly in profiles here. Together with the checksums > > > computation that's *halfing* write throughput on fast drives in my aio > > > branch. > > > > Our defaults are not going to win any performance trophies and so I > > don't see the value in stressing over it here. > > Meh^3. There's a difference between defaults that are about resource > usage (e.g. shared_buffers) and defaults that aren't. fsync isn't about resource usage. > > > > This looks much better from the WAL size perspective, there's now almost > > > > no additional WAL. However, that is because pgbench doesn't do TOAST, so > > > > in a real-world example it might still be quite larger. Also, the vacuum > > > > runtime is still 15x longer. > > > > > > That's obviously an issue. > > > > It'd certainly be nice to figure out a way to improve the VACUUM run but > > I don't think the impact on the time to run VACUUM is really a good > > reason to not move forward with changing the default. > > Vacuum performance is one of *THE* major complaints about > postgres. Making it run slower by a lot obviously exascerbates that > problem significantly. I think it'd be prohibitively expensive if it > were 1.5x, not to even speak of 15x. We already make vacuum, when run out of autovacuum, relatively slow, quite intentionally. If someone's having trouble with vacuum run times they're going to be adjusting the configuration anyway. > > imv, enabling page checksums is akin to having fsync enabled by default. > > Does it impact performance? Yes, surely quite a lot, but it's also the > > safe and sane choice when it comes to defaults. > > Oh for crying out loud. Not sure what you're hoping to gain from such comments, but it doesn't do anything to change my opinion. Thanks, Stephen
Attachment
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Stephen Frost
Date:
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2021-01-06 18:27:48 +0100, Magnus Hagander wrote: > > The other argument is that admins can cheaply and quickly turn off > > checksums if they don't want them. > > > > The same cannot be said for turning them *on* again, that's a very > > slow offline operation at this time. > > > > Turning off checksums doesn't take noticeably more time than say > > changing the shared_buffers from the default, which is also almost > > guaranteed to be wrong for most installations. > > It still requires running a binary locally on the DB server, no? Which > means it'll not be an option on most cloud providers... ... unless they choose to make it an option, which is entirely up to them and certainly well within what they're capable of doing. I'd also mention that, at least according to some cloud providers I've talked to, they specifically wouldn't support PG until data checksums were available, making me not really feel like having them enabled by default would be such an issue (not to mention that, clearly, cloud providers could choose to change the default for their deployments if they wished to). Thanks, Stephen
Attachment
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Magnus Hagander
Date:
On Wed, Jan 6, 2021 at 6:58 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-01-06 18:27:48 +0100, Magnus Hagander wrote: > > The other argument is that admins can cheaply and quickly turn off > > checksums if they don't want them. > > > > The same cannot be said for turning them *on* again, that's a very > > slow offline operation at this time. > > > > Turning off checksums doesn't take noticeably more time than say > > changing the shared_buffers from the default, which is also almost > > guaranteed to be wrong for most installations. > > It still requires running a binary locally on the DB server, no? Which It does. So does changing shared_buffers -- for example you need to run "systemctl" if you're on systemd, or just pg_ctl if you're using unpackaged postres. > means it'll not be an option on most cloud providers... I really don't see why. They've implemented the ability to restart postgres. Surely they can implement the ability to run a single command in between. Or if that's too complicated, they are more than capable of passing a parameter to initdb to change what the default is on their platform. They already do so for other things (such as not using trust or peer auth by default, or by actually not having a superuser setc). -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Michael Banck
Date:
Am Mittwoch, den 06.01.2021, 09:58 -0800 schrieb Andres Freund: > It still requires running a binary locally on the DB server, no? Which > means it'll not be an option on most cloud providers... At least Azure and RDS seem to have data_checksums on anyway, I don't have a GCP test instance around handily right now to check. Micael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Michael Banck
Date:
Hi, On Wed, Jan 06, 2021 at 09:55:08AM -0800, Andres Freund wrote: > On 2021-01-06 12:02:40 -0500, Stephen Frost wrote: > > * Andres Freund (andres@anarazel.de) wrote: > > > On 2021-01-04 19:11:43 +0100, Michael Banck wrote: > > > > This looks much better from the WAL size perspective, there's now almost > > > > no additional WAL. However, that is because pgbench doesn't do TOAST, so > > > > in a real-world example it might still be quite larger. Also, the vacuum > > > > runtime is still 15x longer. > > > > > > That's obviously an issue. > > > > It'd certainly be nice to figure out a way to improve the VACUUM run but > > I don't think the impact on the time to run VACUUM is really a good > > reason to not move forward with changing the default. > > Vacuum performance is one of *THE* major complaints about > postgres. Making it run slower by a lot obviously exascerbates that > problem significantly. I think it'd be prohibitively expensive if it > were 1.5x, not to even speak of 15x. To maybe clarify, the vacuum slowdown is just as large in my (somewhat contrived as a worst-case scenario) tests when wal_log_hints is on and not data_checksums, I just ommitted those numbers due to being basically identical (or maybe a bit worse even): |data_checksums=off, wal_log_hints=off: | |done in 10.24 s (vacuum 3.31 s, primary keys 6.92 s). |done in 8.81 s (vacuum 2.72 s, primary keys 6.09 s). |done in 8.35 s (vacuum 2.32 s, primary keys 6.03 s). | |data_checksums=off, wal_log_hints=on: | |1,5G data1/pg_wal |1,5G data1/base |2,5G data1_archive/ | |done in 87.89 s (vacuum 69.67 s, primary keys 18.23 s). |done in 73.71 s (vacuum 60.19 s, primary keys 13.52 s). |done in 75.12 s (vacuum 62.49 s, primary keys 12.62 s). | |data_checksums=on, wal_log_hints=off: | |done in 67.42 s (vacuum 54.57 s, primary keys 12.85 s). |done in 65.03 s (vacuum 53.25 s, primary keys 11.78 s). |done in 77.57 s (vacuum 62.64 s, primary keys 14.94 s). Of course, wal_log_hints is not the default either and can be turned off easily. You mostly lose the ability to run pg_rewind I think, are there other use-cases for it? Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Michael Banck
Date:
Am Mittwoch, den 06.01.2021, 19:07 +0100 schrieb Michael Banck: > Am Mittwoch, den 06.01.2021, 09:58 -0800 schrieb Andres Freund: > > It still requires running a binary locally on the DB server, no? Which > > means it'll not be an option on most cloud providers... > > At least Azure and RDS seem to have data_checksums on anyway, I don't > have a GCP test instance around handily right now to check. Well I was curious: GCP SQL Postgres also has checksums enabled. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Andres Freund
Date:
Hi, On 2021-01-06 13:01:59 -0500, Stephen Frost wrote: > * Andres Freund (andres@anarazel.de) wrote: > > > imv, enabling page checksums is akin to having fsync enabled by default. > > > Does it impact performance? Yes, surely quite a lot, but it's also the > > > safe and sane choice when it comes to defaults. > > > > Oh for crying out loud. > > Not sure what you're hoping to gain from such comments, but it doesn't > do anything to change my opinion. It seems so facetious to compare fsync=off (will cause corruption) with data_checksums=off (will not cause corruption) that I find the comparison to be insulting. Greetings, Andres Freund
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Peter Geoghegan
Date:
On Wed, Jan 6, 2021 at 9:55 AM Andres Freund <andres@anarazel.de> wrote: > Vacuum performance is one of *THE* major complaints about > postgres. Making it run slower by a lot obviously exascerbates that > problem significantly. I think it'd be prohibitively expensive if it > were 1.5x, not to even speak of 15x. +1. I am *strongly* opposed to enabling checksums by default for this reason. I think that it's a total non-starter, unless and until the overhead can be dramatically reduced. The fact that it isn't the fault of the checksum mechanism in some abstract sense is 100% irrelevant. -- Peter Geoghegan
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Stephen Frost
Date:
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2021-01-06 13:01:59 -0500, Stephen Frost wrote: > > * Andres Freund (andres@anarazel.de) wrote: > > > > imv, enabling page checksums is akin to having fsync enabled by default. > > > > Does it impact performance? Yes, surely quite a lot, but it's also the > > > > safe and sane choice when it comes to defaults. > > > > > > Oh for crying out loud. > > > > Not sure what you're hoping to gain from such comments, but it doesn't > > do anything to change my opinion. > > It seems so facetious to compare fsync=off (will cause corruption) with > data_checksums=off (will not cause corruption) that I find the > comparison to be insulting. Having fsync off won't actually cause corruption unless you have an OS crash or don't sync the disks when you reboot the system though- so it's a hedge against certain failure conditions, as is checksums. Yes, having fsync off on a system and then rebooting it (ungracefully..) will likely cause corruption and, no, having data checksums turned off won't cause corruption in that way or at all in its own right- but there's a decent chance that if there does end up being latent corruption that it'll at least be detected, which is why so many (including, apparently, the popular cloud providers) enable it and why we should have it on by default. I don't agree that they are so different as you make them out to be. I do appreciate that the chances of a random reboot happening are higher than the chance of a disk failure being detected by a PG checksum (and not something else first). Thanks, Stephen
Attachment
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Peter Geoghegan
Date:
On Wed, Jan 6, 2021 at 11:44 AM Stephen Frost <sfrost@snowman.net> wrote: > Having fsync off won't actually cause corruption unless you have an OS > crash or don't sync the disks when you reboot the system though- so it's > a hedge against certain failure conditions, as is checksums. I find this argument baffling. Do you really believe this? -- Peter Geoghegan
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Stephen Frost
Date:
Greetings, * Peter Geoghegan (pg@bowt.ie) wrote: > On Wed, Jan 6, 2021 at 11:44 AM Stephen Frost <sfrost@snowman.net> wrote: > > Having fsync off won't actually cause corruption unless you have an OS > > crash or don't sync the disks when you reboot the system though- so it's > > a hedge against certain failure conditions, as is checksums. > > I find this argument baffling. Do you really believe this? Do you really believe it to be wrong? Do we stop performing the correct write calls in the correct order to the kernel with fsync being off? If the kernel actually handles all of our write calls correctly and we cleanly shut down and the kernel cleanly shuts down and sync's the disks before a reboot, will there be corruption from running with fsync off? If that's the case, I'd certainly be curious to hear under what conditions, when everything works, we'll end up with corruption simply from running with fsync off. I don't mean to imply that I advocate for such- I'd hope that it would be clear from this discussion that I'm not suggesting that we turn fsync off, and rather the opposite, that we have both fsync and data checksums be on by default, but to claim that having fsync off will always, in every situation, cause corruption is over-stating the case. Thanks, Stephen
Attachment
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Peter Geoghegan
Date:
On Wed, Jan 6, 2021 at 12:03 PM Stephen Frost <sfrost@snowman.net> wrote: > Do you really believe it to be wrong? Do we stop performing the correct > write calls in the correct order to the kernel with fsync being off? If > the kernel actually handles all of our write calls correctly and we > cleanly shut down and the kernel cleanly shuts down and sync's the disks > before a reboot, will there be corruption from running with fsync off? This is a total straw man. Everyone understands the technical issues with fsync perfectly well, and everyone understands that everyone understands the issue, so spare me the "I'm just a humble country lawyer" style explanation. What you seem to be arguing is that the differences between disabling checksums and disabling fsync is basically quantitative, and so making a qualitative distinction between those two things is meaningless, and that it logically follows that disagreeing with you is essentially irresponsible. This is a tactic that would be an embarrassment to a high school debate team. It's below you. Your argument may be logically consistent, but it's still nonsense. -- Peter Geoghegan
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Stephen Frost
Date:
Greetings, * Peter Geoghegan (pg@bowt.ie) wrote: > On Wed, Jan 6, 2021 at 12:03 PM Stephen Frost <sfrost@snowman.net> wrote: > > Do you really believe it to be wrong? Do we stop performing the correct > > write calls in the correct order to the kernel with fsync being off? If > > the kernel actually handles all of our write calls correctly and we > > cleanly shut down and the kernel cleanly shuts down and sync's the disks > > before a reboot, will there be corruption from running with fsync off? > > This is a total straw man. Everyone understands the technical issues > with fsync perfectly well, and everyone understands that everyone > understands the issue, so spare me the "I'm just a humble country > lawyer" style explanation. > > What you seem to be arguing is that the differences between disabling > checksums and disabling fsync is basically quantitative, and so making > a qualitative distinction between those two things is meaningless, and > that it logically follows that disagreeing with you is essentially > irresponsible. This is a tactic that would be an embarrassment to a > high school debate team. It's below you. I can agree that there's a usefulness in making a qualitative distinction between them, but we're talking about a default here, not about if we should even have these options or these capabilities or if we should force them upon everyone or if one is somehow better or worse than the other. As already mentioned, it's also, at least today, far simpler to disable checksums than to enable them, which is something else to consider when thinking about what the default should be. That the major cloud providers all have checksums enabled (at least by default, though I wonder if they would even let you turn them off..), even when we don't have them on by default, strikes me as pretty telling that this is something that we should have on by default. Certainly there's a different risk profile between the two and there may be times when someone is fine with running without fsync, or fine running without checksums, but those are, in my view, exceptions made once you understand exactly what risk you're willing to accept, and not what the default or typical deployment should be. Thanks, Stephen
Attachment
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Peter Geoghegan
Date:
On Wed, Jan 6, 2021 at 12:30 PM Stephen Frost <sfrost@snowman.net> wrote: > As already mentioned, it's also, at least today, far > simpler to disable checksums than to enable them, which is something > else to consider when thinking about what the default should be. That is a valid concern. I just don't think that it's good enough on its own, given the overwhelming downside of enabling checksums given the WAL architecture that we have today. > That the major cloud providers all have checksums enabled (at least by > default, though I wonder if they would even let you turn them off..), > even when we don't have them on by default, strikes me as pretty telling > that this is something that we should have on by default. Please provide supporting evidence. I know that EBS itself uses checksums at the block device level, so I'm sure that RDS "uses checksums" in some sense. But does RDS use --data-checksums during initdb? > Certainly there's a different risk profile between the two and there may > be times when someone is fine with running without fsync, or fine > running without checksums, but those are, in my view, exceptions made > once you understand exactly what risk you're willing to accept, and not > what the default or typical deployment should be. Okay, I'll bite. Here is the important difference: Enabling checksums doesn't actually make data corruption less likely, it just makes it easier to detect. Whereas disabling fsync will reliably produce corruption before too long in almost any installation. It may occasionally be appropriate to disable fsync in a very controlled environment, but it's rare, and not much faster than disabling synchronous commits in any case. It barely ever happens. We added page-level checksums in 9.3. Can you imagine a counterfactual history in which Postgres had page checksums since the 1990s, but only added the fsync feature in 9.3? Please answer this non-rhetorical question. -- Peter Geoghegan
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Michael Banck
Date:
Hi, Am Mittwoch, den 06.01.2021, 12:56 -0800 schrieb Peter Geoghegan: > On Wed, Jan 6, 2021 at 12:30 PM Stephen Frost <sfrost@snowman.net> wrote: > > As already mentioned, it's also, at least today, far > > simpler to disable checksums than to enable them, which is something > > else to consider when thinking about what the default should be. > > That is a valid concern. I just don't think that it's good enough on > its own, given the overwhelming downside of enabling checksums given > the WAL architecture that we have today. > > > That the major cloud providers all have checksums enabled (at least by > > default, though I wonder if they would even let you turn them off..), I don't think so, and it would be very weird if they did, not just due to the fact that shutting down the instance and running pg_checksums is only possible since v13 (and only Google Cloud SQL Postgres supports that so far), but also because this is the kind of decisions cloud providers tend to take for their clients and not allow the users any say in (just like how they do backups or failovers). > > even when we don't have them on by default, strikes me as pretty telling > > that this is something that we should have on by default. > > Please provide supporting evidence. I know that EBS itself uses > checksums at the block device level, so I'm sure that RDS "uses > checksums" in some sense. But does RDS use --data-checksums during > initdb? At least data_checksums=on for Azure Managed Postgres, Amazon RDS and Google Cloud SQL Postgres. It might not be on for all types (I just checked the basic one each earlier today), but I guess it is. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Peter Geoghegan
Date:
On Wed, Jan 6, 2021 at 1:04 PM Michael Banck <michael.banck@credativ.de> wrote: > At least data_checksums=on for Azure Managed Postgres, Amazon RDS and > Google Cloud SQL Postgres. It might not be on for all types (I just > checked the basic one each earlier today), but I guess it is. So you tested this using "show data_checksums;" in psql in each case? What does "show full_page_writes;" show you? -- Peter Geoghegan
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Michael Banck
Date:
Am Mittwoch, den 06.01.2021, 13:08 -0800 schrieb Peter Geoghegan: > On Wed, Jan 6, 2021 at 1:04 PM Michael Banck <michael.banck@credativ.de> wrote: > > At least data_checksums=on for Azure Managed Postgres, Amazon RDS and > > Google Cloud SQL Postgres. It might not be on for all types (I just > > checked the basic one each earlier today), but I guess it is. > > So you tested this using "show data_checksums;" in psql in each case? Yes. > What does "show full_page_writes;" show you? It's also 'on' for all three (wal_log_hints is 'off' everywhere). Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Peter Geoghegan
Date:
On Wed, Jan 6, 2021 at 1:08 PM Peter Geoghegan <pg@bowt.ie> wrote: > So you tested this using "show data_checksums;" in psql in each case? > > What does "show full_page_writes;" show you? Another consideration is checkpoint_timeout and max_wal_size. -- Peter Geoghegan
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Michael Banck
Date:
Hi, Am Mittwoch, den 06.01.2021, 13:19 -0800 schrieb Peter Geoghegan: > On Wed, Jan 6, 2021 at 1:08 PM Peter Geoghegan <pg@bowt.ie> wrote: > > So you tested this using "show data_checksums;" in psql in each case? > > > > What does "show full_page_writes;" show you? > > Another consideration is checkpoint_timeout That one seems to be 5min everywhere, and one can change it except on Azure. > and max_wal_size. I think this one is usually based on instance size, but I only have access to two differently sized instance on Azure right now, where it is 1GB for the smallest possible one and 32GB for a production instance. It's 1GB for the small Google Cloud SQL Postgres and 2GB for the small RDS test instance. It is user-changeable everywhere (at least to some degree). Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Peter Geoghegan
Date:
On Wed, Jan 6, 2021 at 1:29 PM Michael Banck <michael.banck@credativ.de> wrote: > That one seems to be 5min everywhere, and one can change it except on > Azure. Okay, thanks for clearing that up. Looks like all of the big 3 cloud providers use Postgres checksums in a straightforward way. I don't have much more to say on this thread. I am -1 on the current proposal to enable page-level checksums by default. I may change my mind on this in the future, perhaps when underlying architectural details change, but right now this seems like a net negative. -- Peter Geoghegan
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Amit Kapila
Date:
On Thu, Jan 7, 2021 at 3:32 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Jan 6, 2021 at 1:29 PM Michael Banck <michael.banck@credativ.de> wrote: > > That one seems to be 5min everywhere, and one can change it except on > > Azure. > > Okay, thanks for clearing that up. Looks like all of the big 3 cloud > providers use Postgres checksums in a straightforward way. > But they might have done something to reduce the impact of enabling checksums like by using a different checksum (for data blocks) and or compression (for WAL) technique. > I don't have much more to say on this thread. I am -1 on the current > proposal to enable page-level checksums by default. > -1 from me too with the current impact on performance and WAL it can have. I was looking at some old thread related to this topic and came across the benchmarking done by Tomas Vondra [1]. It clearly shows that enabling checksums can have a huge impact on init time, WAL, and TPS. Having said that, if we really want to enable checksums, can't we think of improving performance when it is enabled? I could think of two things to improve (a) a better algorithm for wal compression (currently we use pglz), this will allow us to enable wal_compression at least when data_checksums are enabled (b) a better technique for checksums to reduce the cost of PageSetChecksumCopy. I don't have good ideas to offer to improve things in these two areas but I think it is worth investigating if we want to enable checksums. [1] - https://www.postgresql.org/message-id/20190330192543.GH4719%40development -- With Regards, Amit Kapila.
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Stephen Frost
Date:
Greetings, * Peter Geoghegan (pg@bowt.ie) wrote: > On Wed, Jan 6, 2021 at 12:30 PM Stephen Frost <sfrost@snowman.net> wrote: > > As already mentioned, it's also, at least today, far > > simpler to disable checksums than to enable them, which is something > > else to consider when thinking about what the default should be. > > That is a valid concern. I just don't think that it's good enough on > its own, given the overwhelming downside of enabling checksums given > the WAL architecture that we have today. I expected there'd be some disagreement on this, but I do continue to feel that it's sensible to enable checksums by default. I also don't think there's anything particularly wrong with such a difference of opinion, though it likely means that we're going to continue on with the status quo- where, certainly, very many deployments enable it even though the upstream default is to have it disabled. This certainly isn't the only place that's done, though we've been working to improve that situation with things like trying to get rid of 'trust' being used in our default pg_hba.conf. > > That the major cloud providers all have checksums enabled (at least by > > default, though I wonder if they would even let you turn them off..), > > even when we don't have them on by default, strikes me as pretty telling > > that this is something that we should have on by default. > > Please provide supporting evidence. I know that EBS itself uses > checksums at the block device level, so I'm sure that RDS "uses > checksums" in some sense. But does RDS use --data-checksums during > initdb? Short answer is 'yes', as mentioned down-thread and having checksums was a pre-requisite to deploying PG in RDS (or so folks very involved in RDS have told me previously- and I'll also note that it was 9.3 that was first deployed as part of RDS). I don't think there's any question that they're using --data-checksums and that it is, in fact, the actual original PG checksum code (or at least was at 9.3, though I've further heard comments that they actively try to minimize the delta between RDS and PG). > > Certainly there's a different risk profile between the two and there may > > be times when someone is fine with running without fsync, or fine > > running without checksums, but those are, in my view, exceptions made > > once you understand exactly what risk you're willing to accept, and not > > what the default or typical deployment should be. > > Okay, I'll bite. Here is the important difference: Enabling checksums > doesn't actually make data corruption less likely, it just makes it > easier to detect. Whereas disabling fsync will reliably produce > corruption before too long in almost any installation. It may > occasionally be appropriate to disable fsync in a very controlled > environment, but it's rare, and not much faster than disabling > synchronous commits in any case. It barely ever happens. I agree that it doesn't happen very often. I'd say that it's also very infrequent for users who are aware that data checksums are available, and not enabled by default, to deploy non-checksumed systems. > We added page-level checksums in 9.3. Can you imagine a counterfactual > history in which Postgres had page checksums since the 1990s, but only > added the fsync feature in 9.3? Please answer this non-rhetorical > question. Nope, the risk from not having fsync was clearly understood, and still is, to be a larger risk than not having checksums. That doesn't mean there's no risk to not having checksums or that we simply shouldn't consider checksums to be worthwhile or that we shouldn't have them on by default. I outlined them together in that they're both there to address the risk that "something doesn't go right", but, as I said previously and again above, the level of risk between the two isn't the same. That doesn't mean we shouldn't consider that checksums *do* address a risk and consider enabling them by default- even with the performance impact that they have today. Much of this line of discussion seems to be, incorrectly, focused on my mere mention of viewing the use of fsync and checksums as mechanism for addressing certain risks, but that doesn't seem to be a terribly fruitful direction to be going in. I'm not suggesting that we should go turn off fsync by default simply because we don't have checksums on by default, which seems to be the implication. I do think that fsync addresses a large amount of the risks we face (random system reboots, storage being disconnected from the server, etc), and I feel that checksums address certain risks (latent bit flips at various levels, from the physical medium through whatever path is taken to get from the physical medium to the kernel and then to PG, random blocks being swapped from other applications, people deciding to gzip their data directory which I saw last week, etc...) and that all of those risks amount to sufficient justification that they both be enabled by default, but allowed to be disabled in environments where the administrator has considered the risks from each and decided that they're willing to accept them for the benefit of performance. Thanks, Stephen
Attachment
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Stephen Frost
Date:
Greetings, * Michael Banck (michael.banck@credativ.de) wrote: > Am Mittwoch, den 06.01.2021, 13:08 -0800 schrieb Peter Geoghegan: > > On Wed, Jan 6, 2021 at 1:04 PM Michael Banck <michael.banck@credativ.de> wrote: > > > At least data_checksums=on for Azure Managed Postgres, Amazon RDS and > > > Google Cloud SQL Postgres. It might not be on for all types (I just > > > checked the basic one each earlier today), but I guess it is. > > > > So you tested this using "show data_checksums;" in psql in each case? > > Yes. > > > What does "show full_page_writes;" show you? > > It's also 'on' for all three (wal_log_hints is 'off' everywhere). Well, given that data_checksums is 'on', then wal_log_hints is implicitly 'on', of course. Given the mention that maybe they've gone and modified PG underneath, it might be fun to run your same tests against those platforms to see if the amount of WAL generated is in line with what you're seeing for stock PG with the same settings, assuming you can get to that level of detailed information anyway (pretty sure some of them provide that, but not sure if all of them do). Thanks, Stephen
Attachment
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Peter Geoghegan
Date:
On Thu, Jan 7, 2021 at 1:14 PM Stephen Frost <sfrost@snowman.net> wrote: > I expected there'd be some disagreement on this, but I do continue to > feel that it's sensible to enable checksums by default. I also don't > think there's anything particularly wrong with such a difference of > opinion, though it likely means that we're going to continue on with the > status quo- where, certainly, very many deployments enable it even > though the upstream default is to have it disabled. I agree with all that. > This certainly > isn't the only place that's done, though we've been working to improve > that situation with things like trying to get rid of 'trust' being used > in our default pg_hba.conf. That seems like an easier case to make to me. > Short answer is 'yes', as mentioned down-thread and having checksums was > a pre-requisite to deploying PG in RDS (or so folks very involved in RDS > have told me previously- and I'll also note that it was 9.3 that was > first deployed as part of RDS). I don't think there's any question that > they're using --data-checksums and that it is, in fact, the actual > original PG checksum code (or at least was at 9.3, though I've further > heard comments that they actively try to minimize the delta between RDS > and PG). I accept that. > Nope, the risk from not having fsync was clearly understood, and still > is, to be a larger risk than not having checksums. That doesn't mean > there's no risk to not having checksums or that we simply shouldn't > consider checksums to be worthwhile or that we shouldn't have them on by > default. I outlined them together in that they're both there to address > the risk that "something doesn't go right", but, as I said previously > and again above, the level of risk between the two isn't the same. That > doesn't mean we shouldn't consider that checksums *do* address a risk > and consider enabling them by default- even with the performance impact > that they have today. Fair. > Much of this line of discussion seems to be, incorrectly, focused on my > mere mention of viewing the use of fsync and checksums as mechanism for > addressing certain risks, but that doesn't seem to be a terribly > fruitful direction to be going in. I'm not suggesting that we should go > turn off fsync by default simply because we don't have checksums on by > default, which seems to be the implication. I admit that I saw red. This was a direct result of your bogus argument, which greatly overstated the case in favor of enabling checksums by default. I regret my role in that now, though. It would be good to debate the actual issue, but that isn't what I saw. Everyone knows the principles behind checksums and how they're useful -- it doesn't need to be a part of the discussion. I think that it should be possible to make a much better case in favor of enabling checksums by default. On further reflection I actually don't think that the real-world VACUUM overhead is anything like 15x, though the details are complex. I might be willing to help with this analysis, but since you only seem to want to discuss the question in a narrow way (e.g. "I agree that improving compression performance would be good but I don't see that as relevant to the question of what our defaults should be"), I have to wonder if it's worth the trouble. -- Peter Geoghegan
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Stephen Frost
Date:
Greetings, * Peter Geoghegan (pg@bowt.ie) wrote: > On Thu, Jan 7, 2021 at 1:14 PM Stephen Frost <sfrost@snowman.net> wrote: > > Much of this line of discussion seems to be, incorrectly, focused on my > > mere mention of viewing the use of fsync and checksums as mechanism for > > addressing certain risks, but that doesn't seem to be a terribly > > fruitful direction to be going in. I'm not suggesting that we should go > > turn off fsync by default simply because we don't have checksums on by > > default, which seems to be the implication. > > I admit that I saw red. This was a direct result of your bogus > argument, which greatly overstated the case in favor of enabling > checksums by default. I regret my role in that now, though. It would > be good to debate the actual issue, but that isn't what I saw. > Everyone knows the principles behind checksums and how they're useful > -- it doesn't need to be a part of the discussion. I hadn't intended to make an argument that enabling checksums was equivilant to enabling or disabling fsync- I said it was 'akin', by which I meant it was similar in character, as in, as I said previously, a way for PG to hedge against certain external-to-PG risks (though, unfortunately, our checksums aren't able to actually mitigate any of the risks but merely to detect them, but there is certainly value in that too). I also now regret not being clearer as to what I meant with that comment. > I think that it should be possible to make a much better case in favor > of enabling checksums by default. On further reflection I actually > don't think that the real-world VACUUM overhead is anything like 15x, > though the details are complex. I might be willing to help with this > analysis, but since you only seem to want to discuss the question in a > narrow way (e.g. "I agree that improving compression performance would > be good but I don't see that as relevant to the question of what our > defaults should be"), I have to wonder if it's worth the trouble. What I was attempting to get at with that comment is that while I don't feel it's relevant, I wouldn't object to both being enabled by default and if those changes combined helps to get others on board with having checksums enabled by default then such an approach would also get my vote. I also doubt that VACUUM performance would be impacted as heavily in real-world workloads, but I again point out that VACUUMs, in our default configuration, is going to be run with the breaks on since it's run by autovacuum with a non-zero vacuum cost delay. While I've advocated for having that cost delay reduced (or the cost limit increased) in the past, I wouldn't support eliminating the delays entirely as that would then impact foreground activity, which is certainly where performance is more important. I appreciate that VACUUM run by an administrator directly doesn't have the breaks on, but that then is much more likely to impact foreground activity and is generally discouraged because of that- instead it's generally recommended to configure autovacuum to be more aggressive while still having a delay. Once you're past the point where you want delays to be introduced during VACUUM runs, I'd certainly think it's gone past the point where our standard defaults would be appropriate in a number of ways and a user could then consider if they want to disable checksums and accept the risk associated with doing so in favor of making VACUUM go faster, or not. Thanks, Stephen
Attachment
RE: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Stephen Frost <sfrost@snowman.net> > I hadn't intended to make an argument that enabling checksums was > equivilant to enabling or disabling fsync- I said it was 'akin', by which I meant it > was similar in character, as in, as I said previously, a way for PG to hedge > against certain external-to-PG risks (though, unfortunately, our checksums > aren't able to actually mitigate any of the risks but merely to detect them, but > there is certainly value in that too). I only skimmed a few recent mails in this thread, but I've been hoping that the checksum gets on by default. So +1. Asthe storage layers get complex, including the container and virtual storage, we may increasingly be passed bogus data bysoftware bugs. I always tell my little kid to fasten a seatbelt, although she dislikes it as she can't move well. I've also felt it sad that checksum is not mentioned in the database cluster setup section of the manual [1]. If checksumis turned on by default, I think we can mention that checksum can be disabled for maximum speed in this page. It'd be desirable to lighten the overhead of checksumming, but I have no idea now. IIRC, Oracle and SQL Server also scanthe whole page to compute the checksum. Maybe the disadvantage of Postgres is that it has to copy the whole page forfear of concurrent hint bit updates? [1] 19.2. Creating a Database Cluster https://www.postgresql.org/docs/devel/creating-cluster.html Regards Takayuki Tsunakawa
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Laurenz Albe
Date:
On Thu, 2021-01-07 at 16:14 -0500, Stephen Frost wrote: > I expected there'd be some disagreement on this, but I do continue to > feel that it's sensible to enable checksums by default. +1 I think the problem here (apart from the original line of argumentation) is that there are two kinds of PostgreSQL installations: - installations done on dubious hardware with minimal tuning (the "cheap crowd") - installations done on good hardware, where people make an effort to properly configure the database (the "serious crowd") I am aware that this is an oversimplification for the sake of the argument. The voices against checksums on by default are probably thinking of the serious crowd. If checksums were enabled by default, the cheap crowd would benefit from the early warnings that something has gone wrong. The serious crowd are more likely to choose a non-default setting to avoid paying the price for a feature that they don't need. Yours, Laurenz Albe
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
"Andres Freund"
Date:
Hi, On Fri, Jan 8, 2021, at 01:53, Laurenz Albe wrote: > On Thu, 2021-01-07 at 16:14 -0500, Stephen Frost wrote: > > I expected there'd be some disagreement on this, but I do continue to > > feel that it's sensible to enable checksums by default. > > +1 I don't disagree with this in principle, but if you want that you need to work on making checksum overhead far smaller. That'sdoable. Afterwards it makes sense to have this discussion. > I think the problem here (apart from the original line of argumentation) > is that there are two kinds of PostgreSQL installations: > > - installations done on dubious hardware with minimal tuning > (the "cheap crowd") > > - installations done on good hardware, where people make an effort to > properly configure the database (the "serious crowd") > > I am aware that this is an oversimplification for the sake of the argument. > > The voices against checksums on by default are probably thinking of > the serious crowd. > > If checksums were enabled by default, the cheap crowd would benefit > from the early warnings that something has gone wrong. > > The serious crowd are more likely to choose a non-default setting > to avoid paying the price for a feature that they don't need. I don't really buy this argument. That way we're going to have an ever growing set of things that need to be tuned to havea database that's usable in an even halfway busy setup. That's unavoidable in some cases, but it's a significant costacross use cases. Increasing the overhead in the default config from one version to the next isn't great - it makes people more hesitant toupgrade. It's also not a cost you're going to find all that quickly, and it's a really hard to pin down cost. Andres
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
David Steele
Date:
On 1/8/21 5:03 AM, Andres Freund wrote: > On Fri, Jan 8, 2021, at 01:53, Laurenz Albe wrote: >> >> The serious crowd are more likely to choose a non-default setting >> to avoid paying the price for a feature that they don't need. > > I don't really buy this argument. That way we're going to have an ever growing set of things that need to be tuned to havea database that's usable in an even halfway busy setup. That's unavoidable in some cases, but it's a significant costacross use cases. > > Increasing the overhead in the default config from one version to the next isn't great - it makes people more hesitantto upgrade. It's also not a cost you're going to find all that quickly, and it's a really hard to pin down cost. I'm +1 for enabling checksums by default, even with the performance penalties. As far as people upgrading, one advantage is existing pg_upgrade'd databases would not be affected. Only newly init'd clusters would get this setting. Regards, -- -David david@pgmasters.net
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Bruce Momjian
Date:
On Fri, Jan 8, 2021 at 10:53:35AM +0100, Laurenz Albe wrote: > On Thu, 2021-01-07 at 16:14 -0500, Stephen Frost wrote: > > I expected there'd be some disagreement on this, but I do continue to > > feel that it's sensible to enable checksums by default. > > +1 > > I think the problem here (apart from the original line of argumentation) > is that there are two kinds of PostgreSQL installations: > > - installations done on dubious hardware with minimal tuning > (the "cheap crowd") > > - installations done on good hardware, where people make an effort to > properly configure the database (the "serious crowd") > > I am aware that this is an oversimplification for the sake of the argument. > > The voices against checksums on by default are probably thinking of > the serious crowd. > > If checksums were enabled by default, the cheap crowd would benefit > from the early warnings that something has gone wrong. > > The serious crowd are more likely to choose a non-default setting > to avoid paying the price for a feature that they don't need. I think you have captured the major issue here --- it explains a lot. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Bruce Momjian
Date:
On Fri, Jan 8, 2021 at 09:46:58AM -0500, David Steele wrote: > On 1/8/21 5:03 AM, Andres Freund wrote: > > On Fri, Jan 8, 2021, at 01:53, Laurenz Albe wrote: > > > > > > The serious crowd are more likely to choose a non-default setting > > > to avoid paying the price for a feature that they don't need. > > > > I don't really buy this argument. That way we're going to have an ever growing set of things that need to be tuned tohave a database that's usable in an even halfway busy setup. That's unavoidable in some cases, but it's a significant costacross use cases. > > > > Increasing the overhead in the default config from one version to the next isn't great - it makes people more hesitantto upgrade. It's also not a cost you're going to find all that quickly, and it's a really hard to pin down cost. > > I'm +1 for enabling checksums by default, even with the performance > penalties. > > As far as people upgrading, one advantage is existing pg_upgrade'd databases > would not be affected. Only newly init'd clusters would get this setting. I think once we have better online enabling of checksums people can more easily test the overhead on their workloads. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Magnus Hagander
Date:
On Fri, Jan 8, 2021 at 4:57 PM Bruce Momjian <bruce@momjian.us> wrote: > > On Fri, Jan 8, 2021 at 09:46:58AM -0500, David Steele wrote: > > On 1/8/21 5:03 AM, Andres Freund wrote: > > > On Fri, Jan 8, 2021, at 01:53, Laurenz Albe wrote: > > > > > > > > The serious crowd are more likely to choose a non-default setting > > > > to avoid paying the price for a feature that they don't need. > > > > > > I don't really buy this argument. That way we're going to have an ever growing set of things that need to be tunedto have a database that's usable in an even halfway busy setup. That's unavoidable in some cases, but it's a significantcost across use cases. > > > > > > Increasing the overhead in the default config from one version to the next isn't great - it makes people more hesitantto upgrade. It's also not a cost you're going to find all that quickly, and it's a really hard to pin down cost. > > > > I'm +1 for enabling checksums by default, even with the performance > > penalties. > > > > As far as people upgrading, one advantage is existing pg_upgrade'd databases > > would not be affected. Only newly init'd clusters would get this setting. > > I think once we have better online enabling of checksums people can more > easily test the overhead on their workloads. Yeah, definitely. If they have equivalent hardware they can easily do it now -- create a replica, turn off checksums on replica, compare. That is, assuming we turn them on by default :) But being able to turn them both on and off without a large downtime is obviously going to make experimentation a lot more reasonable. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
From
Bruce Momjian
Date:
On Fri, Jan 8, 2021 at 05:06:24PM +0100, Magnus Hagander wrote: > On Fri, Jan 8, 2021 at 4:57 PM Bruce Momjian <bruce@momjian.us> wrote: > > I think once we have better online enabling of checksums people can more > > easily test the overhead on their workloads. > > Yeah, definitely. > > If they have equivalent hardware they can easily do it now -- create a > replica, turn off checksums on replica, compare. That is, assuming we > turn them on by default :) But being able to turn them both on and off > without a large downtime is obviously going to make experimentation a > lot more reasonable. Can someone compute overhead on a real workload for us now? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee