RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 - Mailing list pgsql-bugs
From | Haifang Wang (Centific Technologies Inc) |
---|---|
Subject | RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 |
Date | |
Msg-id | PH8PR21MB39027F1F4EAFEAE33EB4BA39E5E22@PH8PR21MB3902.namprd21.prod.outlook.com Whole thread Raw |
In response to | Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
|
List | pgsql-bugs |
Hi Thomas, Thanks for your reply. But I'm not sure if there is any miscommunication. Let me make it clear again. I’m a program manager in the Windows App Assure ISV Outreach Team at Microsoft. We work with Microsoft’s test organizationto notify developers when issues have been identified in their applications. The issue I reported in this mailis an issue we found in our testing and I believe it also impact a lot of end users. Like you mentioned previously, itis caused by recent operating system upgrade. The solution I shared in my last email was suggested our engineers. It would be great is you could use _wsetlocale API. thisis what the recommended way as par Microsoft documentation as well. (Reference: setlocale wsetlocale: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170 If you don’t want to change the API to update their mapping table with Turkish_Tur in case of Turkish_Türkiye as you havedone for others locale with same issue as well in your code (reference PostgreSQL Source Code: src/port/win32setlocale.cSource File: https://doxygen.postgresql.org/win32setlocale_8c_source.html, Line number 66-67). Please let me know if there is any misunderstanding. Thanks! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Monday, May 13, 2024 3:12 PM To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> Cc: pgsql-bugs@lists.postgresql.org Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Tue, May 14, 2024 at 6:51 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by thechanges made on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future? First, let me restate the problem: When you create a database cluster (= a PostgreSQL instance) with "initdb", unless you request a default locale with --locale,initdb uses setlocale("") to query the system/user default locale. It then records that string in postgresql.conf,and also in the pg_database catalog. On POSIX systems, that captures something like "tr-TR.UTF-8" or similar. On Windows, that captures something like "Turkish_Turkey.1254". Later, PostgreSQL uses newlocale() or setlocale() functions to access that locale again. In rare cases where a country changes its name, a Windows update *renames*the locale, and then those calls fail, because the old name is not recognised anymore. I proposed a partial solution that should help avoid the problem in future: I think that initdb should instead call GetUserDefaultLocaleName() to discover the user account's default locale, becauseit returns BCP47 names like "tr-TR". It is probably less likely for a country or language to change its ISO code(but not impossible[1]), than for the English-language names of them to change. Even if you reject this idea becausetechnically they can both change, there are other reasons why we should not be storing this "display"-style namesanywhere, including that PostgreSQL needs to store them in a place where the encoding must be ASCII, which Türkiye isnot. And finally, the Windows manual explicitly warns us about this[2]: "We don't recommend this form for locale stringsembedded in code or serialized to storage: These strings are more likely to be changed by an operating system update than the locale name form." (Note: the BCP47 support in Windows did not exist when that PostgreSQL code was written, so it did what it had to at thetime.) There are further problems to resolve: 1. We don't know if we should put encodings (AKA codepages?) on the end of those strings or not. There is some confusionabout what exactly it does, and how it interacts with the "ACP". I'd be worried that if you don't put the endingson, perhaps it can change under your feet. (I suspect that part of the discussion on that other thread took somewrong turns, based on citext_utf8 results that were actually probably misleading.) 2. If we do decide to put the encodingsuffixes on, which encoding should we be suggesting? It appears from anecdotal reports that most PostgreSQL-on-Windowsusers are stuck in the past, using the old pre-UTF-8 language-specific encodings. Should we be encouraging UTF-8 use by default, if we can? Maybe that is a separatequestion. Then there is the practical question of what to do with an already-broken system. One idea would be to introduce a "localeremapping" file, pgdata/pg_locale.map where you can write things like "Turkish_Turkey.1254"="Turkish_Türkiye.1254". I think there might be a GUI tool that allows you to duplicate, rename, etc locales in Windows, so you can re-create theold name. I believe that is how some people have fixed their broken databases. I don't know if there is a good reference/blog/articleon that, that we should be pointing people towards if they show up with broken systems. Patches, testing, research are welcome! Even though I put forward that BCP47 idea, it was based on reading the manual, so the "unresolved" questions may in fact be very easy to resolve by people who actually use/know Windows. Even if I had been gung-ho aboutcommitting that in 16 without feedback from Windows users, it would have been too late to help Turkish users with existingdatabases. [1] https://learn.microsoft.com/en-us/globalization/locale/standard-locale-names [2] https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160
pgsql-bugs by date: