Thread: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012
PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012
Hello,
here my problem in details:
Software:
Freshly installed Windows Server 2012 R2 (german)
Microsoft Visual C++ 2013 Redistributable (x86) - 12.0.40664.0
Microsoft Visual C++ 2015-2022 Redistributable (x86) - 14.31.31103
PostgreSQL 10.20 32-bit binaries from https://www.enterprisedb.com/download-postgresql-binaries
Call (excerpt):
initdb.exe .. -E UTF8 --locale=de-x-icu --text-search-config=german ..
Error message:
initdb: ungültiger Locale-Name »de-x-icu«
According to https://www.postgresql.org/download/windows/ PostgreSQL 10 is supported by Windows Server 2012.
The collation 'de-x-icu' is contained in table „pg_collation“.
With default-locale "German_Germany.1252" initdb succeeds, but that locale is not allowed in our environment!
On all newer Windows-OS (Windows Server 2016/2019/2022, Windows 10/11) there is no problem.
Is this a PostgreSQL Bug?
Do you know a work-around?
Is a Windows-Patch or a special software necessary to use ICU?
Best Regards,
Stefan Hoffmann.
ROSESOFT GmbH & Co. KG
Mundenheimer Straße 100
67061 Ludwigshafen am Rhein
Amtsgericht Ludwigshafen am Rhein / HRA 61632
Komplementärin / Sitz Ellerstadt
ROSESOFT Verwaltungs-GmbH
Amtsgericht Ludwigshafen am Rhein / HRB 65870
Geschäftsführer: Torsten Rosebrock
Tel. 0621 / 63 59 11 - 0
Fax 0621 / 63 59 11 - 99
www.rosesoft.de
Re: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012
At Fri, 1 Jul 2022 13:07:13 +0000, "ROSESOFT S. Hoffmann" <sh@rosesoft.de> wrote in > initdb.exe .. -E UTF8 --locale=de-x-icu --text-search-config=german .. > > Error message: > initdb: ungültiger Locale-Name »de-x-icu« > According to https://www.postgresql.org/download/windows/ PostgreSQL > 10 is supported by Windows Server 2012. If you are unfortunately using PostgreSQL 10, it doesn't support ICU collations as the default for databases. It's a new feature of 15. https://www.postgresql.org/docs/15/release-15.html > - Allow ICU collations to be set as the default for clusters and > databases (Peter Eisentraut) > Previously, ICU collations could only be specified in CREATE > COLLATION and used with the COLLATE clause. If you are using PG15, --icu-locale accompanied by --locale-provider=icu instead of --locale will work for you. https://www.postgresql.org/docs/devel/app-initdb.html > Alternatively, the ICU library can be used to provide locale > services. (Again, this only sets the default for subsequently created > databases.) To select this option, specify --locale-provider=icu. To > choose the specific ICU locale ID to apply, use the option > --icu-locale. Note that for implementation reasons and to support > legacy code, initdb will still select and initialize libc locale > settings when the ICU locale provider is used. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
AW: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012
Hello Kyotaro Horiguchi, i still think this a bug, because "createdb" also can't use any ICU-collations (same error message). On Windows Server 2016/2019/2022 and Windows 10/11 the ICU-Options work as expected! Why does the problem only occurs on Windows Server 2012? An Upgrade to PostgreSQL 15 is not an option, now. Best Regards, Stefan. -----Ursprüngliche Nachricht----- Von: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Gesendet: Montag, 4. Juli 2022 02:34 An: ROSESOFT S. Hoffmann <sh@rosesoft.de> Cc: pgsql-bugs@lists.postgresql.org Betreff: Re: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012 At Fri, 1 Jul 2022 13:07:13 +0000, "ROSESOFT S. Hoffmann" <sh@rosesoft.de> wrote in > initdb.exe .. -E UTF8 --locale=de-x-icu --text-search-config=german .. > > Error message: > initdb: ungültiger Locale-Name »de-x-icu« > According to https://www.postgresql.org/download/windows/ PostgreSQL > 10 is supported by Windows Server 2012. If you are unfortunately using PostgreSQL 10, it doesn't support ICU collations as the default for databases. It's a newfeature of 15. https://www.postgresql.org/docs/15/release-15.html > - Allow ICU collations to be set as the default for clusters and > databases (Peter Eisentraut) > Previously, ICU collations could only be specified in CREATE > COLLATION and used with the COLLATE clause. If you are using PG15, --icu-locale accompanied by --locale-provider=icu instead of --locale will work for you. https://www.postgresql.org/docs/devel/app-initdb.html > Alternatively, the ICU library can be used to provide locale services. > (Again, this only sets the default for subsequently created > databases.) To select this option, specify --locale-provider=icu. To > choose the specific ICU locale ID to apply, use the option > --icu-locale. Note that for implementation reasons and to support > legacy code, initdb will still select and initialize libc locale > settings when the ICU locale provider is used. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Jul 4, 2022 at 11:53 PM ROSESOFT S. Hoffmann <sh@rosesoft.de> wrote: > On Windows Server 2016/2019/2022 and Windows 10/11 the ICU-Options work as expected! > Why does the problem only occurs on Windows Server 2012? Windows 10 and later contain a copy of ICU inside them. That's why strings like "de", "de-CH" etc work there: PostgreSQL is using the normal Windows collation API (it's a bit weird that we call it "libc" mode, when it's not exactly libc, but what we mean by that is the OS's native collations...). So I think it is probably Windows' internal copy of ICU that receives that name "de-..." and ignores the trailing part that it doesn't understand. That has nothing to do with PostgreSQL's ICU support! Somewhat confusingly. Windows 10 and later still understand those names like "German_Germany.1252" as well. It seems to be a very bad idea to use them in PostgreSQL, and I'm hoping that we'll soon stop choosing those old style names by default at initdb time in PG16, which will probably also be the first version that can't run on Windows Server 2012.
On Tue, Jul 5, 2022 at 7:22 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Mon, Jul 4, 2022 at 11:53 PM ROSESOFT S. Hoffmann <sh@rosesoft.de> wrote: > > On Windows Server 2016/2019/2022 and Windows 10/11 the ICU-Options work as expected! > > Why does the problem only occurs on Windows Server 2012? > > Windows 10 and later contain a copy of ICU inside them. That's why > strings like "de", "de-CH" etc work there: PostgreSQL is using the > normal Windows collation API (it's a bit weird that we call it "libc" > mode, when it's not exactly libc, but what we mean by that is the OS's > native collations...). So I think it is probably Windows' internal > copy of ICU that receives that name "de-..." and ignores the trailing > part that it doesn't understand. Oh, it might also be that ancient Windows can also understand "de-DE" or "de" too, but not "de-x-icu"? That "de-x-icu" name is an internal PostgreSQL convention for collation identifiers and shouldn't be passed to initdb. If that's the case it might just be that the ICU-based versions of Windows are more tolerant, and interpret "de-x-icu" as "de", ignoring the rest, but the ancient versions reject it? (I have no access to old Windows, I just read that even Vista could understand de-DE type names, formally known as BCP47, in some contexts... it just wasn't using ICU yet).
AW: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012
Hello Thomas, > Windows 10 built in ICU support I thought all ICU stuff was built in PostgreSQL binaries and DLLs, but your explanation seems to be the cause of this problem. > ancient Windows This sounds like using default-collation "de" for initdb and createdb has the same effect as "de-x-icu" with PostgreSQL 10up to 14? Should we then use default-collation "de" instead of "de-x-icu" or "German_Germany.1252" or could this cause problems laterwhen upgrading to PostgreSQL 15 or 16 and using "de-x-icu" as default-collation? Regards, Stefan. -----Ursprüngliche Nachricht----- Von: Thomas Munro <thomas.munro@gmail.com> Gesendet: Dienstag, 5. Juli 2022 10:19 An: ROSESOFT S. Hoffmann <sh@rosesoft.de> Cc: Kyotaro Horiguchi <horikyota.ntt@gmail.com>; pgsql-bugs@lists.postgresql.org Betreff: Re: PostgreSQL 10 initdb with locale de-x-icu fails on Windows Server 2012 On Tue, Jul 5, 2022 at 7:22 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Mon, Jul 4, 2022 at 11:53 PM ROSESOFT S. Hoffmann <sh@rosesoft.de> wrote: > > On Windows Server 2016/2019/2022 and Windows 10/11 the ICU-Options work as expected! > > Why does the problem only occurs on Windows Server 2012? > > Windows 10 and later contain a copy of ICU inside them. That's why > strings like "de", "de-CH" etc work there: PostgreSQL is using the > normal Windows collation API (it's a bit weird that we call it "libc" > mode, when it's not exactly libc, but what we mean by that is the OS's > native collations...). So I think it is probably Windows' internal > copy of ICU that receives that name "de-..." and ignores the trailing > part that it doesn't understand. Oh, it might also be that ancient Windows can also understand "de-DE" or "de" too, but not "de-x-icu"? That "de-x-icu" name is an internal PostgreSQL convention for collation identifiers andshouldn't be passed to initdb. If that's the case it might just be that the ICU-based versions of Windows are more tolerant,and interpret "de-x-icu" as "de", ignoring the rest, but the ancient versions reject it? (I have no access to old Windows, I just read that even Vista could understand de-DE type names, formally known as BCP47,in some contexts... it just wasn't using ICU yet).
On Wed, Jul 6, 2022 at 1:29 AM ROSESOFT S. Hoffmann <sh@rosesoft.de> wrote: > This sounds like using default-collation "de" for initdb and createdb has the same effect as "de-x-icu" with PostgreSQL10 up to 14? > Should we then use default-collation "de" instead of "de-x-icu" or "German_Germany.1252" or could this cause problems laterwhen upgrading to PostgreSQL 15 or 16 and using "de-x-icu" as default-collation? Yes I think you should use "de" or "de-DE", if that is working for you on Windows Server 2012 ("de" meaning German, no particular country, "de-DE" meaning German as used in Germany; I think it's more common to include the country). Then when you come to upgrade your OS, it'll still be understood by modern Windows editions. However, whenever you do an OS upgrade, especially one like 2012->2016+, which I think is switching from Windows' old traditional collation algorithm to ICU, note that any indexes based on the sort order of text will need to be REINDEXed, or you might get some weird results. To summarise: On PostgreSQL 10-14, you can use initdb --locale=de (or de-DE, or German_Germany.1252) and this will be passed to the OS collation system. It sounds like on Windows <= 8, "de" and "de-DE" are accepted by the traditional Windows collation algorithm? And on Windows 10+ that internally uses the ICU library. On PostgreSQL 15 (which is still in beta), you can do that, or alternatively you can use initdb --locale=de --locale-provider=icu. That will use the ICU library directly, without going through Windows. That option is only available if your copy of PostgreSQL was built with ICU support. I think it's likely that in years to come this will become the preferred way to do it, to get away from operating system differences and upgrade problems. (Transition between Windows' internal ICU and PostgreSQL's internal ICU may be an interesting research project.) The above concerns the "database default" locale, or more specifically the template databases' default, which you can also override at CREATE DATABASE time when you create more databases. It's only in 15 that you could use ICU directly for the default locale In all versions of PostgreSQL since 10, you can also use ICU locales in a different way, by explicitly naming them with COLLATE "de-x-icu" in various contexts in SQL (for example in a column definition, or in an ORDER BY clause). That is where the "de-x-icu" style notation comes from. That's the convention we use for identifiers for collations that are created in the pg_collation table to make them available to SQL. That type of name is *not* understood by initdb --locale, or if it is partially understood by Windows 10+ that is only an accident, because its internal ICU is trying to parse it, and understands the "de-" prefix and ignore the rest. Some people prefer to use ICU directly in older PostgreSQL releases by including COLLATE "de-x-icu" directly in all their column definitions, since there wasn't a way to make (direct) ICU the default.