Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Date
Msg-id CA+hUKGKrzGnCW3MoVC7gayHCtJQPEJKgpfQWD6c7eUvFp+ac+A@mail.gmail.com
Whole thread Raw
In response to RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607  ("Haifang Wang (Centific Technologies Inc)" <v-haiwang@microsoft.com>)
Responses RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
List pgsql-bugs
On Tue, May 14, 2024 at 6:51 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by
thechanges made on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future? 

First, let me restate the problem:

When you create a database cluster (= a PostgreSQL instance) with
"initdb", unless you request a default locale with --locale, initdb
uses setlocale("") to query the system/user default locale.  It then
records that string in postgresql.conf, and also in the pg_database
catalog.  On POSIX systems, that captures something like "tr-TR.UTF-8"
or similar.  On Windows, that captures something like
"Turkish_Turkey.1254".  Later, PostgreSQL uses newlocale() or
setlocale() functions to access that locale again.  In rare cases
where a country changes its name, a Windows update *renames* the
locale, and then those calls fail, because the old name is not
recognised anymore.

I proposed a partial solution that should help avoid the problem in future:

I think that initdb should instead call GetUserDefaultLocaleName() to
discover the user account's default locale, because it returns BCP47
names like "tr-TR".  It is probably less likely for a country or
language to change its ISO code (but not impossible[1]), than for the
English-language names of them to change.  Even if you reject this
idea because technically they can both change, there are other reasons
why we should not be storing this "display"-style names anywhere,
including that PostgreSQL needs to store them in a place where the
encoding must be ASCII, which Türkiye is not.  And finally, the
Windows manual explicitly warns us about this[2]: "We don't recommend
this form for locale strings embedded in code or serialized to
storage: These strings are more likely to be changed by an operating
system update than the locale name form."

(Note: the BCP47 support in Windows did not exist when that PostgreSQL
code was written, so it did what it had to at the time.)

There are further problems to resolve:

1.  We don't know if we should put encodings (AKA codepages?) on the
end of those strings or not.  There is some confusion about what
exactly it does, and how it interacts with the "ACP".  I'd be worried
that if you don't put the endings on, perhaps it can change under your
feet.  (I suspect that part of the discussion on that other thread
took some wrong turns, based on citext_utf8 results that were actually
probably misleading.)
2.  If we do decide to put the encoding suffixes on, which encoding
should we be suggesting?  It appears from anecdotal reports that most
PostgreSQL-on-Windows users are stuck in the past, using the old
pre-UTF-8 language-specific encodings.  Should we be encouraging UTF-8
use by default, if we can?  Maybe that is a separate question.

Then there is the practical question of what to do with an
already-broken system.  One idea would be to introduce a "locale
remapping" file, pgdata/pg_locale.map where you can write things
like "Turkish_Turkey.1254"="Turkish_Türkiye.1254".

I think there might be a GUI tool that allows you to duplicate,
rename, etc locales in Windows, so you can re-create the old name.  I
believe that is how some people have fixed their broken databases.  I
don't know if there is a good reference/blog/article on that, that we
should be pointing people towards if they show up with broken systems.

Patches, testing, research are welcome!  Even though I put forward that
BCP47 idea, it was based on reading the manual, so the "unresolved"
questions may in fact be very easy to resolve by people who actually
use/know Windows.  Even if I had been gung-ho about committing
that in 16 without feedback from Windows users, it would have been too
late to help Turkish users with existing databases.

[1] https://learn.microsoft.com/en-us/globalization/locale/standard-locale-names
[2]
https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160



pgsql-bugs by date:

Previous
From: "Haifang Wang (Centific Technologies Inc)"
Date:
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Next
From: "Haifang Wang (Centific Technologies Inc)"
Date:
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607