Thread: Fix order of checking ICU options in initdb and create database

Fix order of checking ICU options in initdb and create database

From
Marina Polyakova
Date:
Hello!

This is the last proposed patch on this subject [1] moved to a separate 
thread for Commitfest..

It looks like that the current order of checking ICU options in initdb 
and create database in PG 15+ is not user-friendly. Examples:

1. initdb reports a missing ICU locale although it may already report 
that the selected encoding cannot be used:

$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: ICU locale must be specified

$ initdb --encoding sql-ascii --locale-provider icu --icu-locale en-US 
hoge
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (SQL_ASCII) is not supported 
with the ICU provider.
initdb: hint: Rerun initdb and either do not specify an encoding 
explicitly, or choose a matching combination.

2. initdb/create database report problems with the ICU locale/encoding 
although they may already report that ICU is not supported in this 
build:

2.1.

$ initdb --locale-provider icu hoge
...
initdb: error: ICU locale must be specified

$ initdb --locale-provider icu --icu-locale en-US hoge
...
initdb: error: ICU is not supported in this build

$ createdb --locale-provider icu hoge
createdb: error: database creation failed: ERROR:  ICU locale must be 
specified

$ createdb --locale-provider icu --icu-locale en-US hoge
createdb: error: database creation failed: ERROR:  ICU is not supported 
in this build

2.2.

$ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii 
hoge
createdb: error: database creation failed: ERROR:  encoding "SQL_ASCII" 
is not supported with ICU provider

$ createdb --locale-provider icu --icu-locale en-US --encoding utf8 hoge
createdb: error: database creation failed: ERROR:  ICU is not supported 
in this build

The patch 
v1-0001-Fix-order-of-checking-ICU-options-in-initdb-and-c.patch fixes 
this:

1.

$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (SQL_ASCII) is not supported 
with the ICU provider.
initdb: hint: Rerun initdb and either do not specify an encoding 
explicitly, or choose a matching combination.

2.1.

$ initdb --locale-provider icu hoge
...
initdb: error: ICU is not supported in this build

$ createdb --locale-provider icu hoge
createdb: error: database creation failed: ERROR:  ICU is not supported 
in this build

2.2.

$ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii 
hoge
createdb: error: database creation failed: ERROR:  ICU is not supported 
in this build

A side effect of the proposed patch in initdb is that if ICU locale is 
missed (or ICU is not supported in this build), the provider, locales 
and encoding are reported before the error message:

$ initdb --locale-provider icu hoge
The files belonging to this database system will be owned by user 
"marina".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
   provider:    icu
   LC_COLLATE:  en_US.UTF-8
   LC_CTYPE:    en_US.UTF-8
   LC_MESSAGES: en_US.UTF-8
   LC_MONETARY: ru_RU.UTF-8
   LC_NUMERIC:  ru_RU.UTF-8
   LC_TIME:     ru_RU.UTF-8
The default database encoding has been set to "UTF8".
initdb: error: ICU locale must be specified

$ initdb --locale-provider icu hoge
The files belonging to this database system will be owned by user 
"marina".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
   provider:    icu
   LC_COLLATE:  en_US.UTF-8
   LC_CTYPE:    en_US.UTF-8
   LC_MESSAGES: en_US.UTF-8
   LC_MONETARY: ru_RU.UTF-8
   LC_NUMERIC:  ru_RU.UTF-8
   LC_TIME:     ru_RU.UTF-8
The default database encoding has been set to "UTF8".
initdb: error: ICU is not supported in this build

I was thinking about another master-only version of the patch that first 
checks everything for provider, locales and encoding but IMO it is worse 
[2]..

[1] 
https://www.postgresql.org/message-id/e94aca035bf0b92fac42d204ad385552%40postgrespro.ru
[2] 
https://www.postgresql.org/message-id/79f410460c4fc9534000785adb8bf39a%40postgrespro.ru

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

Re: Fix order of checking ICU options in initdb and create database

From
Marina Polyakova
Date:
On 2022-10-29 14:33, Marina Polyakova wrote:
> Hello!
> 
> This is the last proposed patch on this subject [1] moved to a
> separate thread for Commitfest..

Also added a patch to export with_icu when running src/bin/scripts tests 
[1].
The problem can be reproduced as

$ meson setup <source dir> && ninja && meson test --print-errorlogs 
--suite setup --suite scripts

[1] https://cirrus-ci.com/task/4825664661487616

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

Re: Fix order of checking ICU options in initdb and create database

From
Jose Arthur Benetasso Villanova
Date:
Hello Marina.

I just reviewed your patch.

It applied cleanly at my current master (commit d6a3dbe14f98d867b2fc3faeb99d2d3c2a48ca67).

Also, it worked as described in email. Since it's a clarification in an 
error message, I think the documentation is fine.

I played a bit with "make check", creating a database in my native 
language (pt_BR), testing with some data and everything worked as 
expected.

--
Jose Arthur Benetasso Villanova



Re: Fix order of checking ICU options in initdb and create database

From
Marina Polyakova
Date:
On 2022-11-12 22:43, Jose Arthur Benetasso Villanova wrote:
> Hello Marina.
> 
> I just reviewed your patch.
> 
> It applied cleanly at my current master (commit
> d6a3dbe14f98d867b2fc3faeb99d2d3c2a48ca67).
> 
> Also, it worked as described in email. Since it's a clarification in
> an error message, I think the documentation is fine.
> 
> I played a bit with "make check", creating a database in my native
> language (pt_BR), testing with some data and everything worked as
> expected.

Hello!

Thank you!

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Fix order of checking ICU options in initdb and create database

From
Peter Eisentraut
Date:
On 29.10.22 15:09, Marina Polyakova wrote:
> On 2022-10-29 14:33, Marina Polyakova wrote:
>> Hello!
>>
>> This is the last proposed patch on this subject [1] moved to a
>> separate thread for Commitfest..
> 
> Also added a patch to export with_icu when running src/bin/scripts tests 
> [1].

I have committed the meson change.




Re: Fix order of checking ICU options in initdb and create database

From
Peter Eisentraut
Date:
On 29.10.22 13:33, Marina Polyakova wrote:
> 2. initdb/create database report problems with the ICU locale/encoding 
> although they may already report that ICU is not supported in this build:
> 
> 2.1.
> 
> $ initdb --locale-provider icu hoge
> ...
> initdb: error: ICU locale must be specified
> 
> $ initdb --locale-provider icu --icu-locale en-US hoge
> ...
> initdb: error: ICU is not supported in this build
> 
> $ createdb --locale-provider icu hoge
> createdb: error: database creation failed: ERROR:  ICU locale must be 
> specified
> 
> $ createdb --locale-provider icu --icu-locale en-US hoge
> createdb: error: database creation failed: ERROR:  ICU is not supported 
> in this build

I'm not in favor of changing this.  The existing code intentionally 
tries to centralize the "ICU is not supported in this build" knowledge 
in few places.  Your patch tries to make this check early, but in the 
process adds more places where ICU support needs to be checked 
explicitly.  This increases the code size and also creates a future 
burden to maintain that level of checking.  I think building without ICU 
should be considered a marginal configuration at this point, so we don't 
need to go out of our way to create a perfect user experience for this 
configuration, as long as we check somewhere in the end.




Re: Fix order of checking ICU options in initdb and create database

From
Марина Полякова
Date:
чт, 17 нояб. 2022 г. в 09:54, Peter Eisentraut
<peter.eisentraut@enterprisedb.com>:
>
> On 29.10.22 15:09, Marina Polyakova wrote:
> > On 2022-10-29 14:33, Marina Polyakova wrote:
> >> Hello!
> >>
> >> This is the last proposed patch on this subject [1] moved to a
> >> separate thread for Commitfest..
> >
> > Also added a patch to export with_icu when running src/bin/scripts tests
> > [1].
>
> I have committed the meson change.

Thank you!

(Sorry, I'm having problems sending emails from corporate email :( )

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Fix order of checking ICU options in initdb and create database

From
Марина Полякова
Date:
чт, 17 нояб. 2022 г. в 09:58, Peter Eisentraut
<peter.eisentraut@enterprisedb.com>:
>
> On 29.10.22 13:33, Marina Polyakova wrote:
> > 2. initdb/create database report problems with the ICU locale/encoding
> > although they may already report that ICU is not supported in this build:
> >
> > 2.1.
> >
> > $ initdb --locale-provider icu hoge
> > ...
> > initdb: error: ICU locale must be specified
> >
> > $ initdb --locale-provider icu --icu-locale en-US hoge
> > ...
> > initdb: error: ICU is not supported in this build
> >
> > $ createdb --locale-provider icu hoge
> > createdb: error: database creation failed: ERROR:  ICU locale must be
> > specified
> >
> > $ createdb --locale-provider icu --icu-locale en-US hoge
> > createdb: error: database creation failed: ERROR:  ICU is not supported
> > in this build
>
> I'm not in favor of changing this.  The existing code intentionally
> tries to centralize the "ICU is not supported in this build" knowledge
> in few places.  Your patch tries to make this check early, but in the
> process adds more places where ICU support needs to be checked
> explicitly.  This increases the code size and also creates a future
> burden to maintain that level of checking.  I think building without ICU
> should be considered a marginal configuration at this point, so we don't
> need to go out of our way to create a perfect user experience for this
> configuration, as long as we check somewhere in the end.

Maybe this should be written in the documentation [1] or --with-icu
should be used by default? As a developer I usually check something
with the simplest configure run to make sure other options do not
affect the checked behaviour. And some other developers in our company
also use simple configure runs, without --with-icu etc.

[1] https://www.postgresql.org/docs/15/install-procedure.html

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Fix order of checking ICU options in initdb and create database

From
Peter Eisentraut
Date:
On 19.11.22 13:12, Марина Полякова wrote:
>> I'm not in favor of changing this.  The existing code intentionally
>> tries to centralize the "ICU is not supported in this build" knowledge
>> in few places.  Your patch tries to make this check early, but in the
>> process adds more places where ICU support needs to be checked
>> explicitly.  This increases the code size and also creates a future
>> burden to maintain that level of checking.  I think building without ICU
>> should be considered a marginal configuration at this point, so we don't
>> need to go out of our way to create a perfect user experience for this
>> configuration, as long as we check somewhere in the end.
> Maybe this should be written in the documentation [1] or --with-icu
> should be used by default? As a developer I usually check something
> with the simplest configure run to make sure other options do not
> affect the checked behaviour. And some other developers in our company
> also use simple configure runs, without --with-icu etc.

Well, this isn't a hard rule, just my opinion and where I see the world 
moving.  It's similar to --with-openssl and --with-lz4 etc.



Re: Fix order of checking ICU options in initdb and create database

From
Марина Полякова
Date:
сб, 19 нояб. 2022 г. в 15:51, Peter Eisentraut
<peter.eisentraut@enterprisedb.com>:
>
> On 19.11.22 13:12, Марина Полякова wrote:
> >> I'm not in favor of changing this.  The existing code intentionally
> >> tries to centralize the "ICU is not supported in this build" knowledge
> >> in few places.  Your patch tries to make this check early, but in the
> >> process adds more places where ICU support needs to be checked
> >> explicitly.  This increases the code size and also creates a future
> >> burden to maintain that level of checking.  I think building without ICU
> >> should be considered a marginal configuration at this point, so we don't
> >> need to go out of our way to create a perfect user experience for this
> >> configuration, as long as we check somewhere in the end.
> > Maybe this should be written in the documentation [1] or --with-icu
> > should be used by default? As a developer I usually check something
> > with the simplest configure run to make sure other options do not
> > affect the checked behaviour. And some other developers in our company
> > also use simple configure runs, without --with-icu etc.
>
> Well, this isn't a hard rule, just my opinion and where I see the world
> moving.  It's similar to --with-openssl and --with-lz4 etc.

Here is another set of proposed patches:

v2-0001-Fix-encoding-check-in-initdb-when-the-option-icu-.patch
Target: PG 15+
Fix encoding check in initdb when the option --icu-locale is not used:

$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (SQL_ASCII) is not supported
with the ICU provider.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

As with the previous version of this fix a side effect is that if ICU
locale is missed (or ICU is not supported in this build), the
provider, locales and encoding are reported before the error message:

$ initdb --locale-provider icu hoge
The files belonging to this database system will be owned by user "marina".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
  provider:    icu
  LC_COLLATE:  en_US.UTF-8
  LC_CTYPE:    en_US.UTF-8
  LC_MESSAGES: en_US.UTF-8
  LC_MONETARY: ru_RU.UTF-8
  LC_NUMERIC:  ru_RU.UTF-8
  LC_TIME:     ru_RU.UTF-8
The default database encoding has been set to "UTF8".
initdb: error: ICU locale must be specified

$ initdb --locale-provider icu --icu-locale en hoge
The files belonging to this database system will be owned by user "marina".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
  provider:    icu
  ICU locale:  en
  LC_COLLATE:  en_US.UTF-8
  LC_CTYPE:    en_US.UTF-8
  LC_MESSAGES: en_US.UTF-8
  LC_MONETARY: ru_RU.UTF-8
  LC_NUMERIC:  ru_RU.UTF-8
  LC_TIME:     ru_RU.UTF-8
The default database encoding has been set to "UTF8".
initdb: error: ICU is not supported in this build

v2-0002-doc-building-without-ICU-is-not-recommended.patch
Target: PG 15+
Fix the documentation that --without-icu is a marginal configuration.

v2-0003-Build-with-ICU-by-default.patch
Target: PG 16+
Build with ICU by default as already done for readline and zlib libraries.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Fix order of checking ICU options in initdb and create database

From
Peter Eisentraut
Date:
On 19.11.22 20:36, Марина Полякова wrote:
> Here is another set of proposed patches:
> 
> v2-0001-Fix-encoding-check-in-initdb-when-the-option-icu-.patch
> Target: PG 15+
> Fix encoding check in initdb when the option --icu-locale is not used:

I'm having a hard time figuring out from your examples what you are 
trying to change.  Which one is the "before" example and which one is 
the "after", and which aspect specifically is the issue and what 
specifically is being addressed?  I tried out the examples in the 
current code and didn't find anything obviously wrong in the behavior.

I'm concerned that the initdb.c changes are a bit unprincipled.  They 
just move code around to achieve some behavior without keeping the 
overall structure in mind.  For example, check_icu_locale_encoding() 
intentionally had the same API as check_locale_encoding(), but now 
that's being changed.  And setlocales() did all the locale parameter 
validity checking, but now part of that is being moved around.  I'm 
afraid this makes initdb.c even more spaghetti code than it already is.

What about those test changes?  I can't tell if they are related. 
createdb isn't being changed; is that test change related or separate?

> v2-0002-doc-building-without-ICU-is-not-recommended.patch
> Target: PG 15+
> Fix the documentation that --without-icu is a marginal configuration.
> 
> v2-0003-Build-with-ICU-by-default.patch
> Target: PG 16+
> Build with ICU by default as already done for readline and zlib libraries.

We are not going to make these kinds of changes specifically for ICU. 
I'd say, for example, the same applies to --with-openssl and --with-lz4, 
and probably others.  If this is an issue, then we need a more general 
approach than just ICU.  This should be a separate thread in any case.



Re: Fix order of checking ICU options in initdb and create database

From
"Gregory Stark (as CFM)"
Date:
Is this feedback enough to focus the work on the right things?

I feel like Marina Polyakova pointed out some real confusing behaviour
and perhaps there's a way to solve them by focusing on one at a time
without making large changes in the code.

Perhaps an idea would be to have each module provide two functions, one
which is called early and signals an error if that module's parameters
are provided when it's not compiled in, and a second which verifies
that the parameters are consistent at the point in time where that's
appropriate. (Not entirely unlike what we do for GUCs, though simpler)

If every module did that consistently then it would avoid making the
changes "unprincipled" or "spaghetti" though frankly I find words like
that not very helpful to someone receiving that feedback.

The patch is obviously not ready for commit now but it also seems like
the feedback has not been really sufficient for Marina Polyakova to
make progress either.


--
Gregory Stark
As Commitfest Manager



Re: Fix order of checking ICU options in initdb and create database

From
Daniel Gustafsson
Date:
This patch has now been waiting for author since December, with the thread
stalled.  I am marking this returned with feedback for now, please feel free to
re-submit the patch to a future CF when there is renewed interest in working on
this.

--
Daniel Gustafsson