Thread: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012

PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012

From
"ROSESOFT S. Hoffmann"
Date:

Hello,

 

here my problem in details:

 

Software:

Freshly installed Windows Server 2012 R2 (german)

Microsoft Visual C++ 2013 Redistributable (x86) - 12.0.40664.0

Microsoft Visual C++ 2015-2022 Redistributable (x86) - 14.31.31103

PostgreSQL 10.20 32-bit binaries from https://www.enterprisedb.com/download-postgresql-binaries

 

Call (excerpt):

initdb.exe .. -E UTF8 --locale=de-x-icu --text-search-config=german ..

 

Error message:

initdb: ungültiger Locale-Name »de-x-icu«

 

According to  https://www.postgresql.org/download/windows/ PostgreSQL 10 is supported by Windows Server 2012.

The collation 'de-x-icu' is contained in table „pg_collation“.

With default-locale "German_Germany.1252" initdb succeeds, but that locale is not allowed in our environment!

On all newer Windows-OS (Windows Server 2016/2019/2022, Windows 10/11) there is no problem.

 

Is this a PostgreSQL Bug?

Do you know a work-around?

Is a Windows-Patch or a special software necessary to use ICU?

 

Best Regards,

Stefan Hoffmann.

 


ROSESOFT GmbH & Co. KG
Mundenheimer Straße 100
67061 Ludwigshafen am Rhein
Amtsgericht Ludwigshafen am Rhein / HRA 61632
Komplementärin / Sitz Ellerstadt
ROSESOFT Verwaltungs-GmbH
Amtsgericht Ludwigshafen am Rhein / HRB 65870
Geschäftsführer: Torsten Rosebrock
Tel. 0621 / 63 59 11 - 0
Fax 0621 / 63 59 11 - 99
www.rosesoft.de

 

Re: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012

From
Kyotaro Horiguchi
Date:
At Fri, 1 Jul 2022 13:07:13 +0000, "ROSESOFT S. Hoffmann" <sh@rosesoft.de> wrote in
> initdb.exe .. -E UTF8 --locale=de-x-icu --text-search-config=german ..
>
> Error message:
> initdb: ungültiger Locale-Name »de-x-icu«

> According to https://www.postgresql.org/download/windows/ PostgreSQL
> 10 is supported by Windows Server 2012.

If you are unfortunately using PostgreSQL 10, it doesn't support ICU
collations as the default for databases.  It's a new feature of 15.

https://www.postgresql.org/docs/15/release-15.html
> - Allow ICU collations to be set as the default for clusters and
>   databases (Peter Eisentraut)
>   Previously, ICU collations could only be specified in CREATE
>   COLLATION and used with the COLLATE clause.

If you are using PG15, --icu-locale accompanied by
--locale-provider=icu instead of --locale will work for you.

https://www.postgresql.org/docs/devel/app-initdb.html

> Alternatively, the ICU library can be used to provide locale
> services. (Again, this only sets the default for subsequently created
> databases.) To select this option, specify --locale-provider=icu. To
> choose the specific ICU locale ID to apply, use the option
> --icu-locale. Note that for implementation reasons and to support
> legacy code, initdb will still select and initialize libc locale
> settings when the ICU locale provider is used.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



AW: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012

From
"ROSESOFT S. Hoffmann"
Date:
Hello Kyotaro Horiguchi,

i still think this a bug, because "createdb" also can't use any ICU-collations (same error message).

On Windows Server 2016/2019/2022 and Windows 10/11 the ICU-Options work as expected!
Why does the problem only occurs on Windows Server 2012?

An Upgrade to PostgreSQL 15 is not an option, now.

Best Regards,
Stefan.

-----Ursprüngliche Nachricht-----
Von: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Gesendet: Montag, 4. Juli 2022 02:34
An: ROSESOFT S. Hoffmann <sh@rosesoft.de>
Cc: pgsql-bugs@lists.postgresql.org
Betreff: Re: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012

At Fri, 1 Jul 2022 13:07:13 +0000, "ROSESOFT S. Hoffmann" <sh@rosesoft.de> wrote in
> initdb.exe .. -E UTF8 --locale=de-x-icu --text-search-config=german ..
>
> Error message:
> initdb: ungültiger Locale-Name »de-x-icu«

> According to https://www.postgresql.org/download/windows/ PostgreSQL
> 10 is supported by Windows Server 2012.

If you are unfortunately using PostgreSQL 10, it doesn't support ICU collations as the default for databases.  It's a
newfeature of 15. 

https://www.postgresql.org/docs/15/release-15.html
> - Allow ICU collations to be set as the default for clusters and
>   databases (Peter Eisentraut)
>   Previously, ICU collations could only be specified in CREATE
>   COLLATION and used with the COLLATE clause.

If you are using PG15, --icu-locale accompanied by --locale-provider=icu instead of --locale will work for you.

https://www.postgresql.org/docs/devel/app-initdb.html

> Alternatively, the ICU library can be used to provide locale services.
> (Again, this only sets the default for subsequently created
> databases.) To select this option, specify --locale-provider=icu. To
> choose the specific ICU locale ID to apply, use the option
> --icu-locale. Note that for implementation reasons and to support
> legacy code, initdb will still select and initialize libc locale
> settings when the ICU locale provider is used.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012

From
Thomas Munro
Date:
On Mon, Jul 4, 2022 at 11:53 PM ROSESOFT S. Hoffmann <sh@rosesoft.de> wrote:
> On Windows Server 2016/2019/2022 and Windows 10/11 the ICU-Options work as expected!
> Why does the problem only occurs on Windows Server 2012?

Windows 10 and later contain a copy of ICU inside them.  That's why
strings like "de", "de-CH" etc work there: PostgreSQL is using the
normal Windows collation API (it's a bit weird that we call it "libc"
mode, when it's not exactly libc, but what we mean by that is the OS's
native collations...).  So I think it is probably Windows' internal
copy of ICU that receives that name "de-..." and ignores the trailing
part that it doesn't understand.

That has nothing to do with PostgreSQL's ICU support!  Somewhat confusingly.

Windows 10 and later still understand those names like
"German_Germany.1252" as well.  It seems to be a very bad idea to use
them in PostgreSQL, and I'm hoping that we'll soon stop choosing those
old style names by default at initdb time in PG16, which will probably
also be the first version that can't run on Windows Server 2012.



Re: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012

From
Thomas Munro
Date:
On Tue, Jul 5, 2022 at 7:22 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Mon, Jul 4, 2022 at 11:53 PM ROSESOFT S. Hoffmann <sh@rosesoft.de> wrote:
> > On Windows Server 2016/2019/2022 and Windows 10/11 the ICU-Options work as expected!
> > Why does the problem only occurs on Windows Server 2012?
>
> Windows 10 and later contain a copy of ICU inside them.  That's why
> strings like "de", "de-CH" etc work there: PostgreSQL is using the
> normal Windows collation API (it's a bit weird that we call it "libc"
> mode, when it's not exactly libc, but what we mean by that is the OS's
> native collations...).  So I think it is probably Windows' internal
> copy of ICU that receives that name "de-..." and ignores the trailing
> part that it doesn't understand.

Oh, it might also be that ancient Windows can also understand "de-DE"
or "de" too, but not "de-x-icu"?  That "de-x-icu" name is an internal
PostgreSQL convention for collation identifiers and shouldn't be
passed to initdb.  If that's the case it might just be that the
ICU-based versions of Windows are more tolerant, and interpret
"de-x-icu" as "de", ignoring the rest, but the ancient versions reject
it?

(I have no access to old Windows, I just read that even Vista could
understand de-DE type names, formally known as BCP47, in some
contexts... it just wasn't using ICU yet).



AW: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012

From
"ROSESOFT S. Hoffmann"
Date:
Hello Thomas,

> Windows 10 built in ICU support
I thought all ICU stuff was built in PostgreSQL binaries and DLLs, but your explanation seems to be the cause of this
problem.

> ancient Windows
This sounds like using default-collation "de" for initdb and createdb has the same effect as "de-x-icu" with PostgreSQL
10up to 14?
 
Should we then use default-collation "de" instead of "de-x-icu" or "German_Germany.1252" or could this cause problems
laterwhen upgrading to PostgreSQL 15 or 16 and using "de-x-icu" as default-collation?
 

Regards, Stefan.

-----Ursprüngliche Nachricht-----
Von: Thomas Munro <thomas.munro@gmail.com> 
Gesendet: Dienstag, 5. Juli 2022 10:19
An: ROSESOFT S. Hoffmann <sh@rosesoft.de>
Cc: Kyotaro Horiguchi <horikyota.ntt@gmail.com>; pgsql-bugs@lists.postgresql.org
Betreff: Re: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012

On Tue, Jul 5, 2022 at 7:22 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Mon, Jul 4, 2022 at 11:53 PM ROSESOFT S. Hoffmann <sh@rosesoft.de> wrote:
> > On Windows Server 2016/2019/2022 and Windows 10/11 the ICU-Options work as expected!
> > Why does the problem only occurs on Windows Server 2012?
>
> Windows 10 and later contain a copy of ICU inside them.  That's why 
> strings like "de", "de-CH" etc work there: PostgreSQL is using the 
> normal Windows collation API (it's a bit weird that we call it "libc"
> mode, when it's not exactly libc, but what we mean by that is the OS's 
> native collations...).  So I think it is probably Windows' internal 
> copy of ICU that receives that name "de-..." and ignores the trailing 
> part that it doesn't understand.

Oh, it might also be that ancient Windows can also understand "de-DE"
or "de" too, but not "de-x-icu"?  That "de-x-icu" name is an internal PostgreSQL convention for collation identifiers
andshouldn't be passed to initdb.  If that's the case it might just be that the ICU-based versions of Windows are more
tolerant,and interpret "de-x-icu" as "de", ignoring the rest, but the ancient versions reject it?
 

(I have no access to old Windows, I just read that even Vista could understand de-DE type names, formally known as
BCP47,in some contexts... it just wasn't using ICU yet).
 

Re: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012

From
Thomas Munro
Date:
On Wed, Jul 6, 2022 at 1:29 AM ROSESOFT S. Hoffmann <sh@rosesoft.de> wrote:
> This sounds like using default-collation "de" for initdb and createdb has the same effect as "de-x-icu" with
PostgreSQL10 up to 14?
 
> Should we then use default-collation "de" instead of "de-x-icu" or "German_Germany.1252" or could this cause problems
laterwhen upgrading to PostgreSQL 15 or 16 and using "de-x-icu" as default-collation?
 

Yes I think you should use "de" or "de-DE", if that is working for you
on Windows Server 2012 ("de" meaning German, no particular country,
"de-DE" meaning German as used in Germany; I think it's more common to
include the country).  Then when you come to upgrade your OS, it'll
still be understood by modern Windows editions.  However, whenever you
do an OS upgrade, especially one like 2012->2016+, which I think is
switching from Windows' old traditional collation algorithm to ICU,
note that any indexes based on the sort order of text will need to be
REINDEXed, or you might get some weird results.

To summarise:

On PostgreSQL 10-14, you can use initdb --locale=de (or de-DE, or
German_Germany.1252) and this will be passed to the OS collation
system.  It sounds like on Windows <= 8, "de" and "de-DE" are accepted
by the traditional Windows collation algorithm?  And on Windows 10+
that internally uses the ICU library.

On PostgreSQL 15 (which is still in beta), you can do that, or
alternatively you can use initdb --locale=de --locale-provider=icu.
That will use the ICU library directly, without going through Windows.
That option is only available if your copy of PostgreSQL was built
with ICU support.  I think it's likely that in years to come this will
become the preferred way to do it, to get away from operating system
differences and upgrade problems.  (Transition between Windows'
internal ICU and PostgreSQL's internal ICU may be an interesting
research project.)

The above concerns the "database default" locale, or more specifically
the template databases' default, which you can also override at CREATE
DATABASE time when you create more databases.  It's only in 15 that
you could use ICU directly for the default locale  In all versions of
PostgreSQL since 10, you can also use ICU locales in a different way,
by explicitly naming them with COLLATE "de-x-icu" in various contexts
in SQL (for example in a column definition, or in an ORDER BY clause).
That is where the "de-x-icu" style notation comes from.  That's the
convention we use for identifiers for collations that are created in
the pg_collation table to make them available to SQL.  That type of
name is *not* understood by initdb --locale, or if it is partially
understood by Windows 10+ that is only an accident, because its
internal ICU is trying to parse it, and understands the "de-" prefix
and ignore the rest.  Some people prefer to use ICU directly in older
PostgreSQL releases by including COLLATE "de-x-icu" directly in all
their column definitions, since there wasn't a way to make (direct)
ICU the default.