On Wed, Jul 6, 2022 at 1:29 AM ROSESOFT S. Hoffmann <sh@rosesoft.de> wrote:
> This sounds like using default-collation "de" for initdb and createdb has the same effect as "de-x-icu" with
PostgreSQL10 up to 14?
> Should we then use default-collation "de" instead of "de-x-icu" or "German_Germany.1252" or could this cause problems
laterwhen upgrading to PostgreSQL 15 or 16 and using "de-x-icu" as default-collation?
Yes I think you should use "de" or "de-DE", if that is working for you
on Windows Server 2012 ("de" meaning German, no particular country,
"de-DE" meaning German as used in Germany; I think it's more common to
include the country). Then when you come to upgrade your OS, it'll
still be understood by modern Windows editions. However, whenever you
do an OS upgrade, especially one like 2012->2016+, which I think is
switching from Windows' old traditional collation algorithm to ICU,
note that any indexes based on the sort order of text will need to be
REINDEXed, or you might get some weird results.
To summarise:
On PostgreSQL 10-14, you can use initdb --locale=de (or de-DE, or
German_Germany.1252) and this will be passed to the OS collation
system. It sounds like on Windows <= 8, "de" and "de-DE" are accepted
by the traditional Windows collation algorithm? And on Windows 10+
that internally uses the ICU library.
On PostgreSQL 15 (which is still in beta), you can do that, or
alternatively you can use initdb --locale=de --locale-provider=icu.
That will use the ICU library directly, without going through Windows.
That option is only available if your copy of PostgreSQL was built
with ICU support. I think it's likely that in years to come this will
become the preferred way to do it, to get away from operating system
differences and upgrade problems. (Transition between Windows'
internal ICU and PostgreSQL's internal ICU may be an interesting
research project.)
The above concerns the "database default" locale, or more specifically
the template databases' default, which you can also override at CREATE
DATABASE time when you create more databases. It's only in 15 that
you could use ICU directly for the default locale In all versions of
PostgreSQL since 10, you can also use ICU locales in a different way,
by explicitly naming them with COLLATE "de-x-icu" in various contexts
in SQL (for example in a column definition, or in an ORDER BY clause).
That is where the "de-x-icu" style notation comes from. That's the
convention we use for identifiers for collations that are created in
the pg_collation table to make them available to SQL. That type of
name is *not* understood by initdb --locale, or if it is partially
understood by Windows 10+ that is only an accident, because its
internal ICU is trying to parse it, and understands the "de-" prefix
and ignore the rest. Some people prefer to use ICU directly in older
PostgreSQL releases by including COLLATE "de-x-icu" directly in all
their column definitions, since there wasn't a way to make (direct)
ICU the default.