Thread: Fix order of checking ICU options in initdb and create database
Hello! This is the last proposed patch on this subject [1] moved to a separate thread for Commitfest.. It looks like that the current order of checking ICU options in initdb and create database in PG 15+ is not user-friendly. Examples: 1. initdb reports a missing ICU locale although it may already report that the selected encoding cannot be used: $ initdb --encoding sql-ascii --locale-provider icu hoge ... initdb: error: ICU locale must be specified $ initdb --encoding sql-ascii --locale-provider icu --icu-locale en-US hoge ... initdb: error: encoding mismatch initdb: detail: The encoding you selected (SQL_ASCII) is not supported with the ICU provider. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. 2. initdb/create database report problems with the ICU locale/encoding although they may already report that ICU is not supported in this build: 2.1. $ initdb --locale-provider icu hoge ... initdb: error: ICU locale must be specified $ initdb --locale-provider icu --icu-locale en-US hoge ... initdb: error: ICU is not supported in this build $ createdb --locale-provider icu hoge createdb: error: database creation failed: ERROR: ICU locale must be specified $ createdb --locale-provider icu --icu-locale en-US hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build 2.2. $ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii hoge createdb: error: database creation failed: ERROR: encoding "SQL_ASCII" is not supported with ICU provider $ createdb --locale-provider icu --icu-locale en-US --encoding utf8 hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build The patch v1-0001-Fix-order-of-checking-ICU-options-in-initdb-and-c.patch fixes this: 1. $ initdb --encoding sql-ascii --locale-provider icu hoge ... initdb: error: encoding mismatch initdb: detail: The encoding you selected (SQL_ASCII) is not supported with the ICU provider. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. 2.1. $ initdb --locale-provider icu hoge ... initdb: error: ICU is not supported in this build $ createdb --locale-provider icu hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build 2.2. $ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build A side effect of the proposed patch in initdb is that if ICU locale is missed (or ICU is not supported in this build), the provider, locales and encoding are reported before the error message: $ initdb --locale-provider icu hoge The files belonging to this database system will be owned by user "marina". This user must also own the server process. The database cluster will be initialized with this locale configuration: provider: icu LC_COLLATE: en_US.UTF-8 LC_CTYPE: en_US.UTF-8 LC_MESSAGES: en_US.UTF-8 LC_MONETARY: ru_RU.UTF-8 LC_NUMERIC: ru_RU.UTF-8 LC_TIME: ru_RU.UTF-8 The default database encoding has been set to "UTF8". initdb: error: ICU locale must be specified $ initdb --locale-provider icu hoge The files belonging to this database system will be owned by user "marina". This user must also own the server process. The database cluster will be initialized with this locale configuration: provider: icu LC_COLLATE: en_US.UTF-8 LC_CTYPE: en_US.UTF-8 LC_MESSAGES: en_US.UTF-8 LC_MONETARY: ru_RU.UTF-8 LC_NUMERIC: ru_RU.UTF-8 LC_TIME: ru_RU.UTF-8 The default database encoding has been set to "UTF8". initdb: error: ICU is not supported in this build I was thinking about another master-only version of the patch that first checks everything for provider, locales and encoding but IMO it is worse [2].. [1] https://www.postgresql.org/message-id/e94aca035bf0b92fac42d204ad385552%40postgrespro.ru [2] https://www.postgresql.org/message-id/79f410460c4fc9534000785adb8bf39a%40postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 2022-10-29 14:33, Marina Polyakova wrote: > Hello! > > This is the last proposed patch on this subject [1] moved to a > separate thread for Commitfest.. Also added a patch to export with_icu when running src/bin/scripts tests [1]. The problem can be reproduced as $ meson setup <source dir> && ninja && meson test --print-errorlogs --suite setup --suite scripts [1] https://cirrus-ci.com/task/4825664661487616 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Re: Fix order of checking ICU options in initdb and create database
From
Jose Arthur Benetasso Villanova
Date:
Hello Marina. I just reviewed your patch. It applied cleanly at my current master (commit d6a3dbe14f98d867b2fc3faeb99d2d3c2a48ca67). Also, it worked as described in email. Since it's a clarification in an error message, I think the documentation is fine. I played a bit with "make check", creating a database in my native language (pt_BR), testing with some data and everything worked as expected. -- Jose Arthur Benetasso Villanova
On 2022-11-12 22:43, Jose Arthur Benetasso Villanova wrote: > Hello Marina. > > I just reviewed your patch. > > It applied cleanly at my current master (commit > d6a3dbe14f98d867b2fc3faeb99d2d3c2a48ca67). > > Also, it worked as described in email. Since it's a clarification in > an error message, I think the documentation is fine. > > I played a bit with "make check", creating a database in my native > language (pt_BR), testing with some data and everything worked as > expected. Hello! Thank you! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 29.10.22 15:09, Marina Polyakova wrote: > On 2022-10-29 14:33, Marina Polyakova wrote: >> Hello! >> >> This is the last proposed patch on this subject [1] moved to a >> separate thread for Commitfest.. > > Also added a patch to export with_icu when running src/bin/scripts tests > [1]. I have committed the meson change.
On 29.10.22 13:33, Marina Polyakova wrote: > 2. initdb/create database report problems with the ICU locale/encoding > although they may already report that ICU is not supported in this build: > > 2.1. > > $ initdb --locale-provider icu hoge > ... > initdb: error: ICU locale must be specified > > $ initdb --locale-provider icu --icu-locale en-US hoge > ... > initdb: error: ICU is not supported in this build > > $ createdb --locale-provider icu hoge > createdb: error: database creation failed: ERROR: ICU locale must be > specified > > $ createdb --locale-provider icu --icu-locale en-US hoge > createdb: error: database creation failed: ERROR: ICU is not supported > in this build I'm not in favor of changing this. The existing code intentionally tries to centralize the "ICU is not supported in this build" knowledge in few places. Your patch tries to make this check early, but in the process adds more places where ICU support needs to be checked explicitly. This increases the code size and also creates a future burden to maintain that level of checking. I think building without ICU should be considered a marginal configuration at this point, so we don't need to go out of our way to create a perfect user experience for this configuration, as long as we check somewhere in the end.
чт, 17 нояб. 2022 г. в 09:54, Peter Eisentraut <peter.eisentraut@enterprisedb.com>: > > On 29.10.22 15:09, Marina Polyakova wrote: > > On 2022-10-29 14:33, Marina Polyakova wrote: > >> Hello! > >> > >> This is the last proposed patch on this subject [1] moved to a > >> separate thread for Commitfest.. > > > > Also added a patch to export with_icu when running src/bin/scripts tests > > [1]. > > I have committed the meson change. Thank you! (Sorry, I'm having problems sending emails from corporate email :( ) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
чт, 17 нояб. 2022 г. в 09:58, Peter Eisentraut <peter.eisentraut@enterprisedb.com>: > > On 29.10.22 13:33, Marina Polyakova wrote: > > 2. initdb/create database report problems with the ICU locale/encoding > > although they may already report that ICU is not supported in this build: > > > > 2.1. > > > > $ initdb --locale-provider icu hoge > > ... > > initdb: error: ICU locale must be specified > > > > $ initdb --locale-provider icu --icu-locale en-US hoge > > ... > > initdb: error: ICU is not supported in this build > > > > $ createdb --locale-provider icu hoge > > createdb: error: database creation failed: ERROR: ICU locale must be > > specified > > > > $ createdb --locale-provider icu --icu-locale en-US hoge > > createdb: error: database creation failed: ERROR: ICU is not supported > > in this build > > I'm not in favor of changing this. The existing code intentionally > tries to centralize the "ICU is not supported in this build" knowledge > in few places. Your patch tries to make this check early, but in the > process adds more places where ICU support needs to be checked > explicitly. This increases the code size and also creates a future > burden to maintain that level of checking. I think building without ICU > should be considered a marginal configuration at this point, so we don't > need to go out of our way to create a perfect user experience for this > configuration, as long as we check somewhere in the end. Maybe this should be written in the documentation [1] or --with-icu should be used by default? As a developer I usually check something with the simplest configure run to make sure other options do not affect the checked behaviour. And some other developers in our company also use simple configure runs, without --with-icu etc. [1] https://www.postgresql.org/docs/15/install-procedure.html -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 19.11.22 13:12, Марина Полякова wrote: >> I'm not in favor of changing this. The existing code intentionally >> tries to centralize the "ICU is not supported in this build" knowledge >> in few places. Your patch tries to make this check early, but in the >> process adds more places where ICU support needs to be checked >> explicitly. This increases the code size and also creates a future >> burden to maintain that level of checking. I think building without ICU >> should be considered a marginal configuration at this point, so we don't >> need to go out of our way to create a perfect user experience for this >> configuration, as long as we check somewhere in the end. > Maybe this should be written in the documentation [1] or --with-icu > should be used by default? As a developer I usually check something > with the simplest configure run to make sure other options do not > affect the checked behaviour. And some other developers in our company > also use simple configure runs, without --with-icu etc. Well, this isn't a hard rule, just my opinion and where I see the world moving. It's similar to --with-openssl and --with-lz4 etc.
сб, 19 нояб. 2022 г. в 15:51, Peter Eisentraut <peter.eisentraut@enterprisedb.com>: > > On 19.11.22 13:12, Марина Полякова wrote: > >> I'm not in favor of changing this. The existing code intentionally > >> tries to centralize the "ICU is not supported in this build" knowledge > >> in few places. Your patch tries to make this check early, but in the > >> process adds more places where ICU support needs to be checked > >> explicitly. This increases the code size and also creates a future > >> burden to maintain that level of checking. I think building without ICU > >> should be considered a marginal configuration at this point, so we don't > >> need to go out of our way to create a perfect user experience for this > >> configuration, as long as we check somewhere in the end. > > Maybe this should be written in the documentation [1] or --with-icu > > should be used by default? As a developer I usually check something > > with the simplest configure run to make sure other options do not > > affect the checked behaviour. And some other developers in our company > > also use simple configure runs, without --with-icu etc. > > Well, this isn't a hard rule, just my opinion and where I see the world > moving. It's similar to --with-openssl and --with-lz4 etc. Here is another set of proposed patches: v2-0001-Fix-encoding-check-in-initdb-when-the-option-icu-.patch Target: PG 15+ Fix encoding check in initdb when the option --icu-locale is not used: $ initdb --encoding sql-ascii --locale-provider icu hoge ... initdb: error: encoding mismatch initdb: detail: The encoding you selected (SQL_ASCII) is not supported with the ICU provider. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. As with the previous version of this fix a side effect is that if ICU locale is missed (or ICU is not supported in this build), the provider, locales and encoding are reported before the error message: $ initdb --locale-provider icu hoge The files belonging to this database system will be owned by user "marina". This user must also own the server process. The database cluster will be initialized with this locale configuration: provider: icu LC_COLLATE: en_US.UTF-8 LC_CTYPE: en_US.UTF-8 LC_MESSAGES: en_US.UTF-8 LC_MONETARY: ru_RU.UTF-8 LC_NUMERIC: ru_RU.UTF-8 LC_TIME: ru_RU.UTF-8 The default database encoding has been set to "UTF8". initdb: error: ICU locale must be specified $ initdb --locale-provider icu --icu-locale en hoge The files belonging to this database system will be owned by user "marina". This user must also own the server process. The database cluster will be initialized with this locale configuration: provider: icu ICU locale: en LC_COLLATE: en_US.UTF-8 LC_CTYPE: en_US.UTF-8 LC_MESSAGES: en_US.UTF-8 LC_MONETARY: ru_RU.UTF-8 LC_NUMERIC: ru_RU.UTF-8 LC_TIME: ru_RU.UTF-8 The default database encoding has been set to "UTF8". initdb: error: ICU is not supported in this build v2-0002-doc-building-without-ICU-is-not-recommended.patch Target: PG 15+ Fix the documentation that --without-icu is a marginal configuration. v2-0003-Build-with-ICU-by-default.patch Target: PG 16+ Build with ICU by default as already done for readline and zlib libraries. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 19.11.22 20:36, Марина Полякова wrote: > Here is another set of proposed patches: > > v2-0001-Fix-encoding-check-in-initdb-when-the-option-icu-.patch > Target: PG 15+ > Fix encoding check in initdb when the option --icu-locale is not used: I'm having a hard time figuring out from your examples what you are trying to change. Which one is the "before" example and which one is the "after", and which aspect specifically is the issue and what specifically is being addressed? I tried out the examples in the current code and didn't find anything obviously wrong in the behavior. I'm concerned that the initdb.c changes are a bit unprincipled. They just move code around to achieve some behavior without keeping the overall structure in mind. For example, check_icu_locale_encoding() intentionally had the same API as check_locale_encoding(), but now that's being changed. And setlocales() did all the locale parameter validity checking, but now part of that is being moved around. I'm afraid this makes initdb.c even more spaghetti code than it already is. What about those test changes? I can't tell if they are related. createdb isn't being changed; is that test change related or separate? > v2-0002-doc-building-without-ICU-is-not-recommended.patch > Target: PG 15+ > Fix the documentation that --without-icu is a marginal configuration. > > v2-0003-Build-with-ICU-by-default.patch > Target: PG 16+ > Build with ICU by default as already done for readline and zlib libraries. We are not going to make these kinds of changes specifically for ICU. I'd say, for example, the same applies to --with-openssl and --with-lz4, and probably others. If this is an issue, then we need a more general approach than just ICU. This should be a separate thread in any case.
Re: Fix order of checking ICU options in initdb and create database
From
"Gregory Stark (as CFM)"
Date:
Is this feedback enough to focus the work on the right things? I feel like Marina Polyakova pointed out some real confusing behaviour and perhaps there's a way to solve them by focusing on one at a time without making large changes in the code. Perhaps an idea would be to have each module provide two functions, one which is called early and signals an error if that module's parameters are provided when it's not compiled in, and a second which verifies that the parameters are consistent at the point in time where that's appropriate. (Not entirely unlike what we do for GUCs, though simpler) If every module did that consistently then it would avoid making the changes "unprincipled" or "spaghetti" though frankly I find words like that not very helpful to someone receiving that feedback. The patch is obviously not ready for commit now but it also seems like the feedback has not been really sufficient for Marina Polyakova to make progress either. -- Gregory Stark As Commitfest Manager
This patch has now been waiting for author since December, with the thread stalled. I am marking this returned with feedback for now, please feel free to re-submit the patch to a future CF when there is renewed interest in working on this. -- Daniel Gustafsson