Thread: pgsql: Support "postgres -C" with runtime-computed GUCs
Support "postgres -C" with runtime-computed GUCs Until now, the -C option of postgres was handled before a small subset of GUCs computed at runtime are initialized, leading to incorrect results as GUC machinery would fall back to default values for such parameters. For example, data_checksums could report "off" for a cluster as the control file is not loaded yet. Or wal_segment_size would show a segment size at 16MB even if initdb --wal-segsize used something else. Worse, the command would fail to properly report the recently-introduced shared_memory, that requires to load shared_preload_libraries as these could ask for a chunk of shared memory. Support for runtime GUCs comes with a limitation, as the operation is now allowed on a running server. One notable reason for this is that _PG_init() functions of loadable libraries are called before all runtime-computed GUCs are initialized, and this is not guaranteed to be safe to do on running servers. For the case of shared_memory_size, where we want to know how much memory would be used without allocating it, this limitation is fine. Another case where this will help is for huge pages, with the introduction of a different GUC to evaluate the amount of huge pages required for a server before starting it, without having to allocate large chunks of memory. This feature is controlled with a new GUC flag, and four parameters are classified as runtime-computed as of this change: - data_checksums - shared_memory_size - data_directory_mode - wal_segment_size Some TAP tests are added to provide some coverage here, using data_checksums in the tests of pg_checksums. Per discussion with Andres Freund, Justin Pryzby, Magnus Hagander and more. Author: Nathan Bossart Discussion: https://postgr.es/m/F2772387-CE0F-46BF-B5F1-CC55516EB885@amazon.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/0c39c292077ef3ba987ced0dc6ea1c8f4f1e1f4b Modified Files -------------- doc/src/sgml/ref/postgres-ref.sgml | 11 ++++++-- src/backend/postmaster/postmaster.c | 50 ++++++++++++++++++++++++++++++----- src/backend/utils/misc/guc.c | 8 +++--- src/bin/pg_checksums/t/002_actions.pl | 22 ++++++++++++++- src/include/utils/guc.h | 6 +++++ 5 files changed, 83 insertions(+), 14 deletions(-)
Michael Paquier <michael@paquier.xyz> writes: > Support "postgres -C" with runtime-computed GUCs Test case for this seems to have newline-related issues on Windows. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-09-16%2018%3A36%3A14 # Failed test 'data_checksums=on is reported on an offline cluster stdout /(?^:^on$)/' # at t/002_actions.pl line 184. # 'on # ' # doesn't match '(?^:^on$)' regards, tom lane
On Thu, Sep 16, 2021 at 05:56:49PM -0400, Tom Lane wrote: > Test case for this seems to have newline-related issues on Windows. Thanks. I missed it. > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2021-09-16%2018%3A36%3A14 > > # Failed test 'data_checksums=on is reported on an offline cluster stdout /(?^:^on$)/' > # at t/002_actions.pl line 184. > # 'on > # ' > # doesn't match '(?^:^on$)' The regex should match AFAIK, and that worked in my own WIN32, non-Msys, environment. I see the business in PostgresNode::command_like_safe to work around such a case like that in pg_ctl (efd7f8e), very similar to what I am doing here so an extra chomp() should address this issue. Shouldn't we try in the long-term to make the command_checks_* routines use temporary files rather than the raw outputs if these are not reliable, by the way? Adding Andrew in CC about that. I am not completely sure what's going on here yet, so I'll just switch the test to be skipped when Msys is involved for now. That should be enough to bring back those machines to green. -- Michael
Attachment
On Fri, Sep 17, 2021 at 08:08:44AM +0900, Michael Paquier wrote: > I am not completely sure what's going on here yet, so I'll just switch > the test to be skipped when Msys is involved for now. That should be > enough to bring back those machines to green. So, after drinking down a coffee, I have remembered the difference between the Msys and native perls when it comes to IPC::Run: - Msys perl generates \r\n. - Native perl generates \n. We have already a couple of places where we filter that out, like PostgresNode::psql or slurp_file. But we are missing some spots before calling like() for outputs generated by IPC::Run. I have tracked all those places with the attached, and I think that this should take care of the failure seen here while preventing future problems. Any thoughts? -- Michael
Attachment
On 2021-Sep-17, Michael Paquier wrote: > On Fri, Sep 17, 2021 at 08:08:44AM +0900, Michael Paquier wrote: > > I am not completely sure what's going on here yet, so I'll just switch > > the test to be skipped when Msys is involved for now. That should be > > enough to bring back those machines to green. > > So, after drinking down a coffee, I have remembered the difference > between the Msys and native perls when it comes to IPC::Run: > - Msys perl generates \r\n. > - Native perl generates \n. Yeah, we've been here before -- see 015061690c65 and the commits referenced there. > We have already a couple of places where we filter that out, like > PostgresNode::psql or slurp_file. But we are missing some spots > before calling like() for outputs generated by IPC::Run. I have > tracked all those places with the attached, and I think that this > should take care of the failure seen here while preventing future > problems. Looks reasonable to me. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "¿Cómo puedes confiar en algo que pagas y que no ves, y no confiar en algo que te dan y te lo muestran?" (Germán Poo)
On 9/16/21 8:44 PM, Michael Paquier wrote: > On Fri, Sep 17, 2021 at 08:08:44AM +0900, Michael Paquier wrote: >> I am not completely sure what's going on here yet, so I'll just switch >> the test to be skipped when Msys is involved for now. That should be >> enough to bring back those machines to green. > So, after drinking down a coffee, I have remembered the difference > between the Msys and native perls when it comes to IPC::Run: > - Msys perl generates \r\n. > - Native perl generates \n. > > We have already a couple of places where we filter that out, like > PostgresNode::psql or slurp_file. But we are missing some spots > before calling like() for outputs generated by IPC::Run. I have > tracked all those places with the attached, and I think that this > should take care of the failure seen here while preventing future > problems. > > Any thoughts? The fixups for command_checks_all should go before the loops, not inside them. I would probably add a single line like this just after the call that checks the status: foreach ($stderr, $stdout) { s/\r\n/\n/g if $Config{osname} eq 'msys'; } although maybe perltidy would reformat that. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Fri, Sep 17, 2021 at 03:18:38PM -0400, Andrew Dunstan wrote: > The fixups for command_checks_all should go before the loops, not inside > them. Right. > I would probably add a single line like this just after the call that > checks the status: > > foreach ($stderr, $stdout) { s/\r\n/\n/g if $Config{osname} eq 'msys'; } Yes, that works. Fine by me to use this grammar. > although maybe perltidy would reformat that. And perltidy tells me nothing here. -- Michael
Attachment
On Sat, Sep 18, 2021 at 08:18:58AM +0900, Michael Paquier wrote: > And perltidy tells me nothing here. Please note that I am planning to get the fixes in TestLib.pm applied and back-patched to prevent surprises if there is any bug fix that introduces a test where this could be hit. As 14 RC1 has just been stamped, this will have to wait a bit more. -- Michael
Attachment
Hi, On 2021-09-16 02:05:02 +0000, Michael Paquier wrote: > Some TAP tests are added to provide some coverage here, using > data_checksums in the tests of pg_checksums. I just rebased my AIO branch ontop of this, and promptly got a CI failure. Which appears to be unrelated to AIO. The CI system that we use runs the scripts in a privileged account. Which normally is fine, because pg_ctl drops permissions. However, the new test here doesn't. See e.g. here https://cirrus-ci.com/task/6455395922214912?logs=bincheck#L27 # Running: postgres -D c:/cirrus/src/bin/pg_checksums/tmp_check/t_002_actions_node_checksum_data/pgdata -C data_checksums not ok 22 - data_checksums=on is reported on an offline cluster status (got 1 vs expected 0) # Failed test 'data_checksums=on is reported on an offline cluster status (got 1 vs expected 0)' # at t/002_actions.pl line 189. not ok 23 - data_checksums=on is reported on an offline cluster stdout /(?^:^on$)/ # Failed test 'data_checksums=on is reported on an offline cluster stdout /(?^:^on$)/' # at t/002_actions.pl line 189. # '' # doesn't match '(?^:^on$)' not ok 24 - data_checksums=on is reported on an offline cluster stderr /(?^:database system is shut down)/ # Failed test 'data_checksums=on is reported on an offline cluster stderr /(?^:database system is shut down)/' # at t/002_actions.pl line 189. # 'Execution of PostgreSQL by a user with administrative permissions is not # permitted. # The server must be started under an unprivileged user ID to prevent # possible system security compromises. See the documentation for # more information on how to properly start the server. # ' # doesn't match '(?^:database system is shut down)' ### Starting node "node_checksum" So the problem is that -C data_checksums requires permissions to be dropped, but the way it's invoked here that's not guaranteed... And I don't think we should remove that requirement, given that we're actually doing stuff to the lock file... Greetings, Andres Freund
On Mon, Sep 20, 2021 at 08:20:40PM -0700, Andres Freund wrote: > So the problem is that -C data_checksums requires permissions to be dropped, > but the way it's invoked here that's not guaranteed... And I don't think we > should remove that requirement, given that we're actually doing stuff to the > lock file... Hmm. One way I can think of to keep the test would be something like that to avoid problems with pg_ctl redirecting the server's stderr to stdout (hence the tweak for log_min_messages): pg_ctl start -s -o '-C data_checksums -c log_min_messages=fatal' This would print the parameter, while getting to stderr mostly a "could not start server". Would that work for you? -- Michael
Attachment
Hi, On September 20, 2021 8:47:24 PM PDT, Michael Paquier <michael@paquier.xyz> wrote: >On Mon, Sep 20, 2021 at 08:20:40PM -0700, Andres Freund wrote: >> So the problem is that -C data_checksums requires permissions to be dropped, >> but the way it's invoked here that's not guaranteed... And I don't think we >> should remove that requirement, given that we're actually doing stuff to the >> lock file... > >Hmm. One way I can think of to keep the test would be something like >that to avoid problems with pg_ctl redirecting the server's stderr to >stdout (hence the tweak for log_min_messages): >pg_ctl start -s -o '-C data_checksums -c log_min_messages=fatal' Yes, that should work. Although error should suffice. FWIW, I think it'd be good to hide that message in the -C case. >This would print the parameter, while getting to stderr mostly a >"could not start server". Would that work for you? Yes, I think so. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Tue, Sep 21, 2021 at 12:47:24PM +0900, Michael Paquier wrote: > This would print the parameter, while getting to stderr mostly a > "could not start server". Would that work for you? And.. Here is an actual patch. The first test works here on Win32, but I don't have a CI environment with an unprivileged account. The second is funky, as pg_ctl would return 0 causing a failure with command_fails_like() because of the way start_postmaster() differs in pg_ctl. it may be possible to work around that by switching to command_checks_all() but that would make the whole uglier with different error handling, and I guess different reports generated, so I would just remove it. Could you check if the first case would work in your environment? If not, I would not mind just removing the test. I have discovered a separate bug thanks to it, but that's not worth breaking your case either. -- Michael
Attachment
Hi, On 2021-09-21 13:42:34 +0900, Michael Paquier wrote: > On Tue, Sep 21, 2021 at 12:47:24PM +0900, Michael Paquier wrote: > > This would print the parameter, while getting to stderr mostly a > > "could not start server". Would that work for you? > > And.. Here is an actual patch. Cool! > The first test works here on Win32, but I don't have a CI environment > with an unprivileged account. I assume it's just a typo, but the problem is the opposite, it's a *privileged* CI account causing trouble... > Could you check if the first case would work in your environment? If > not, I would not mind just removing the test. I have discovered a > separate bug thanks to it, but that's not worth breaking your case > either. It's running now: https://cirrus-ci.com/build/5927040688848896 If interesting I can also come up with the necessary steps to activate it in your repo... Greetings, Andres Freund
Hi, On 2021-09-20 22:11:53 -0700, Andres Freund wrote: > On 2021-09-21 13:42:34 +0900, Michael Paquier wrote: > > Could you check if the first case would work in your environment? If > > not, I would not mind just removing the test. I have discovered a > > separate bug thanks to it, but that's not worth breaking your case > > either. > > It's running now: https://cirrus-ci.com/build/5927040688848896 And while not finished entirely, the checksum tests passed: [05:28:13.788] t/001_basic.pl .... ok [05:28:35.762] t/002_actions.pl .. ok [05:28:35.767] All tests successful. [05:28:35.767] Files=2, Tests=74, 22 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU) Greetings, Andres Freund
On Mon, Sep 20, 2021 at 10:11:53PM -0700, Andres Freund wrote: > On 2021-09-21 13:42:34 +0900, Michael Paquier wrote: > > The first test works here on Win32, but I don't have a CI environment > > with an unprivileged account. > > I assume it's just a typo, but the problem is the opposite, it's a > *privileged* CI account causing trouble... Er, yes. That should have been "privileged". >> Could you check if the first case would work in your environment? If >> not, I would not mind just removing the test. I have discovered a >> separate bug thanks to it, but that's not worth breaking your case >> either. > > It's running now: https://cirrus-ci.com/build/5927040688848896 This one has passed. > If interesting I can also come up with the necessary steps to activate > it in your repo... That could be helpful. I have never played with those tools. -- Michael
Attachment
Hi, On 2021-09-21 14:37:05 +0900, Michael Paquier wrote: > > If interesting I can also come up with the necessary steps to activate > > it in your repo... > > That could be helpful. I have never played with those tools. It's really useful. I'm planning to propose a bit more minimal version to be included in the repo. Especially things like having automated backtraces on all platforms is just a huge win during development. You basically just need to configure the Cirrus extension in github, see https://cirrus-ci.org/guide/quick-start/ Then include the changes at the top of https://github.com/anarazel/postgres/tree/pg_checksum_ci_fail in your tree, and it'll be built and tested (check-world) on linux/windows/freebsd/osx. Greetings, Andres Freund
On Mon, Sep 20, 2021 at 10:33:05PM -0700, Andres Freund wrote: > And while not finished entirely, the checksum tests passed: > > [05:28:13.788] t/001_basic.pl .... ok > [05:28:35.762] t/002_actions.pl .. ok > [05:28:35.767] All tests successful. > [05:28:35.767] Files=2, Tests=74, 22 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU) Thanks. I'll group this stuff with what I still need to do for Msys. This just needs to wait until RC1 is tagged as I need to address the existing issues in TestLib.pm first. -- Michael
Attachment
On Tue, Sep 21, 2021 at 08:30:13AM +0900, Michael Paquier wrote: > Please note that I am planning to get the fixes in TestLib.pm applied > and back-patched to prevent surprises if there is any bug fix that > introduces a test where this could be hit. As 14 RC1 has just been > stamped, this will have to wait a bit more. Done as of 0d91c52 & co. -- Michael
Attachment
On Tue, Sep 21, 2021 at 03:12:02PM +0900, Michael Paquier wrote: > Thanks. I'll group this stuff with what I still need to do for Msys. > This just needs to wait until RC1 is tagged as I need to address the > existing issues in TestLib.pm first. 1a9d8028 should have taken care of the problem. Thanks for the report. -- Michael
Attachment
On Wed, Sep 22, 2021 at 11:01:05AM +0900, Michael Paquier wrote: > 1a9d8028 should have taken care of the problem. Thanks for the > report. jacana has reported green after this commit, and the test has passed with Msys. So we are done here. -- Michael
Attachment
On 2021-09-22 11:01:05 +0900, Michael Paquier wrote: > On Tue, Sep 21, 2021 at 03:12:02PM +0900, Michael Paquier wrote: > > Thanks. I'll group this stuff with what I still need to do for Msys. > > This just needs to wait until RC1 is tagged as I need to address the > > existing issues in TestLib.pm first. > > 1a9d8028 should have taken care of the problem. Thanks for the > report. Thanks.