Thread: Windows Application Issues | PostgreSQL | REF # 48475607
Windows Application Issues | PostgreSQL | REF # 48475607
I’m sorry to bother you if you are not the right person. In that case, could you please help to point me in the right direction for the right contact? Thanks a lot!
Hi Team,
I’m a program manager in the Windows App Assure ISV Outreach Team at Microsoft. We work with Microsoft’s test organization to notify developers when issues have been identified in their applications. We’re reaching out to notify you of a potential issue in one of your applications.
The issue details are below, please review it when you have a moment. Our goal is to work with you to address this issue and to understand what your expected timeline to address this issue might be. If you have any questions about the details below or have already addressed this issue in a forthcoming update, please let me know.
Account | PostgreSQL Global Development Group |
Product | PostgreSQL |
Reference # | 48475607 |
Issue | Postgres.exe crash observed while installing the application.
Environment: Desktop
Observations: Postgres.exe crash is observed while installing the application. Should not observe any crash while installing the application |
Resource | For any questions on app development (or) submission on windows, contact Windows Dev Center. https://developer.microsoft.com/en-us/windows/support |
Thanks!
Haifang
On Fri, May 3, 2024 at 7:05 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Deploy windows server 2019 Turkey Build. ... > Postgres.exe crash is observed while installing the application. Hi, "Crash" with no details is not a very useful report, but I guess that this is the Turkey -> Türkiye issue caused by a recent operating system upgrade, which is apparently now hitting the server edition of Windows based on the new inflow of bug reports. I think we know approximately how to fix it, and there are several possible workarounds for a system that is already in this state, but interested parties who know and care about the relevant OS need to get involved to make progress. It's too late for Turkish but ideally we'll be able to stop this from happening the next time a country changes its name. I have written everything I know about the issue here: https://www.postgresql.org/message-id/flat/18196-b10f93dfbde3d7db%40postgresql.org Not being a Windows user, I have not been able to take that proposed fix all the way to the finish line, so all I can do is post what I know and hope that open source will happen.
On Fri, May 3, 2024 at 11:50 AM Thomas Munro <thomas.munro@gmail.com> wrote: > https://www.postgresql.org/message-id/flat/18196-b10f93dfbde3d7db%40postgresql.org And more here: https://www.postgresql.org/message-id/flat/CA+hUKGJ=XThErgAQRoqfCy1bKPxXVuF0=2zDbB+SxDs59pv7Fw@mail.gmail.com
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Thomas, Thanks a lot for your previous reply. Still need to double check with you: You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by the changesmade on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future? Thanks! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Thursday, May 2, 2024 4:51 PM To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> Cc: pgsql-bugs@lists.postgresql.org Subject: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 [You don't often get email from thomas.munro@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification] On Fri, May 3, 2024 at 7:05 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Deploy windows server 2019 Turkey Build. ... > Postgres.exe crash is observed while installing the application. Hi, "Crash" with no details is not a very useful report, but I guess that this is the Turkey -> Türkiye issue caused by a recentoperating system upgrade, which is apparently now hitting the server edition of Windows based on the new inflow ofbug reports. I think we know approximately how to fix it, and there are several possible workarounds for a system thatis already in this state, but interested parties who know and care about the relevant OS need to get involved to makeprogress. It's too late for Turkish but ideally we'll be able to stop this from happening the next time a country changesits name. I have written everything I know about the issue here: https://www.postgresql.org/message-id/flat/18196-b10f93dfbde3d7db%40postgresql.org Not being a Windows user, I have not been able to take that proposed fix all the way to the finish line, so all I can dois post what I know and hope that open source will happen.
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Thomas and team, Some suggestion from our engineers, please take a look and let me know if you have any question: 1. Could you please use _wsetlocale API in case of wide string locale parameter, this is what the recommended way as parMicrosoft documentation as well. (Reference: setlocale: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170,wsetlocale: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170) 2. Additionally you can work around in case you don’t want to change the API to update their mapping table with Turkish_Turin case of Turkish_Türkiye as you have done for others locale with same issue as well in your code (reference PostgreSQL Source Code: src/port/win32setlocale.c Source File: https://doxygen.postgresql.org/win32setlocale_8c_source.html,Line number 66-67) Please let me know if there is any other questions. Regards! Haifang -----Original Message----- From: Haifang Wang (Centific Technologies Inc) Sent: Monday, May 13, 2024 11:52 AM To: Thomas Munro <thomas.munro@gmail.com> Cc: pgsql-bugs@lists.postgresql.org Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 Hi Thomas, Thanks a lot for your previous reply. Still need to double check with you: You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by the changesmade on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future? Thanks! Haifang -----Original Message----- From: Thomas Munro <mailto:thomas.munro@gmail.com> Sent: Thursday, May 2, 2024 4:51 PM To: Haifang Wang (Centific Technologies Inc) <mailto:v-haiwang@microsoft.com> Cc: mailto:pgsql-bugs@lists.postgresql.org Subject: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 [You don't often get email from mailto:thomas.munro@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification] On Fri, May 3, 2024 at 7:05 AM Haifang Wang (Centific Technologies Inc) <mailto:v-haiwang@microsoft.com> wrote: > Deploy windows server 2019 Turkey Build. ... > Postgres.exe crash is observed while installing the application. Hi, "Crash" with no details is not a very useful report, but I guess that this is the Turkey -> Türkiye issue caused by a recentoperating system upgrade, which is apparently now hitting the server edition of Windows based on the new inflow ofbug reports. I think we know approximately how to fix it, and there are several possible workarounds for a system thatis already in this state, but interested parties who know and care about the relevant OS need to get involved to makeprogress. It's too late for Turkish but ideally we'll be able to stop this from happening the next time a country changesits name. I have written everything I know about the issue here: https://www.postgresql.org/message-id/flat/18196-b10f93dfbde3d7db%40postgresql.org Not being a Windows user, I have not been able to take that proposed fix all the way to the finish line, so all I can dois post what I know and hope that open source will happen.
On Tue, May 14, 2024 at 6:51 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by thechanges made on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future? First, let me restate the problem: When you create a database cluster (= a PostgreSQL instance) with "initdb", unless you request a default locale with --locale, initdb uses setlocale("") to query the system/user default locale. It then records that string in postgresql.conf, and also in the pg_database catalog. On POSIX systems, that captures something like "tr-TR.UTF-8" or similar. On Windows, that captures something like "Turkish_Turkey.1254". Later, PostgreSQL uses newlocale() or setlocale() functions to access that locale again. In rare cases where a country changes its name, a Windows update *renames* the locale, and then those calls fail, because the old name is not recognised anymore. I proposed a partial solution that should help avoid the problem in future: I think that initdb should instead call GetUserDefaultLocaleName() to discover the user account's default locale, because it returns BCP47 names like "tr-TR". It is probably less likely for a country or language to change its ISO code (but not impossible[1]), than for the English-language names of them to change. Even if you reject this idea because technically they can both change, there are other reasons why we should not be storing this "display"-style names anywhere, including that PostgreSQL needs to store them in a place where the encoding must be ASCII, which Türkiye is not. And finally, the Windows manual explicitly warns us about this[2]: "We don't recommend this form for locale strings embedded in code or serialized to storage: These strings are more likely to be changed by an operating system update than the locale name form." (Note: the BCP47 support in Windows did not exist when that PostgreSQL code was written, so it did what it had to at the time.) There are further problems to resolve: 1. We don't know if we should put encodings (AKA codepages?) on the end of those strings or not. There is some confusion about what exactly it does, and how it interacts with the "ACP". I'd be worried that if you don't put the endings on, perhaps it can change under your feet. (I suspect that part of the discussion on that other thread took some wrong turns, based on citext_utf8 results that were actually probably misleading.) 2. If we do decide to put the encoding suffixes on, which encoding should we be suggesting? It appears from anecdotal reports that most PostgreSQL-on-Windows users are stuck in the past, using the old pre-UTF-8 language-specific encodings. Should we be encouraging UTF-8 use by default, if we can? Maybe that is a separate question. Then there is the practical question of what to do with an already-broken system. One idea would be to introduce a "locale remapping" file, pgdata/pg_locale.map where you can write things like "Turkish_Turkey.1254"="Turkish_Türkiye.1254". I think there might be a GUI tool that allows you to duplicate, rename, etc locales in Windows, so you can re-create the old name. I believe that is how some people have fixed their broken databases. I don't know if there is a good reference/blog/article on that, that we should be pointing people towards if they show up with broken systems. Patches, testing, research are welcome! Even though I put forward that BCP47 idea, it was based on reading the manual, so the "unresolved" questions may in fact be very easy to resolve by people who actually use/know Windows. Even if I had been gung-ho about committing that in 16 without feedback from Windows users, it would have been too late to help Turkish users with existing databases. [1] https://learn.microsoft.com/en-us/globalization/locale/standard-locale-names [2] https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Thomas, Thanks for your reply. But I'm not sure if there is any miscommunication. Let me make it clear again. I’m a program manager in the Windows App Assure ISV Outreach Team at Microsoft. We work with Microsoft’s test organizationto notify developers when issues have been identified in their applications. The issue I reported in this mailis an issue we found in our testing and I believe it also impact a lot of end users. Like you mentioned previously, itis caused by recent operating system upgrade. The solution I shared in my last email was suggested our engineers. It would be great is you could use _wsetlocale API. thisis what the recommended way as par Microsoft documentation as well. (Reference: setlocale wsetlocale: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170 If you don’t want to change the API to update their mapping table with Turkish_Tur in case of Turkish_Türkiye as you havedone for others locale with same issue as well in your code (reference PostgreSQL Source Code: src/port/win32setlocale.cSource File: https://doxygen.postgresql.org/win32setlocale_8c_source.html, Line number 66-67). Please let me know if there is any misunderstanding. Thanks! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Monday, May 13, 2024 3:12 PM To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> Cc: pgsql-bugs@lists.postgresql.org Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Tue, May 14, 2024 at 6:51 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by thechanges made on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future? First, let me restate the problem: When you create a database cluster (= a PostgreSQL instance) with "initdb", unless you request a default locale with --locale,initdb uses setlocale("") to query the system/user default locale. It then records that string in postgresql.conf,and also in the pg_database catalog. On POSIX systems, that captures something like "tr-TR.UTF-8" or similar. On Windows, that captures something like "Turkish_Turkey.1254". Later, PostgreSQL uses newlocale() or setlocale() functions to access that locale again. In rare cases where a country changes its name, a Windows update *renames*the locale, and then those calls fail, because the old name is not recognised anymore. I proposed a partial solution that should help avoid the problem in future: I think that initdb should instead call GetUserDefaultLocaleName() to discover the user account's default locale, becauseit returns BCP47 names like "tr-TR". It is probably less likely for a country or language to change its ISO code(but not impossible[1]), than for the English-language names of them to change. Even if you reject this idea becausetechnically they can both change, there are other reasons why we should not be storing this "display"-style namesanywhere, including that PostgreSQL needs to store them in a place where the encoding must be ASCII, which Türkiye isnot. And finally, the Windows manual explicitly warns us about this[2]: "We don't recommend this form for locale stringsembedded in code or serialized to storage: These strings are more likely to be changed by an operating system update than the locale name form." (Note: the BCP47 support in Windows did not exist when that PostgreSQL code was written, so it did what it had to at thetime.) There are further problems to resolve: 1. We don't know if we should put encodings (AKA codepages?) on the end of those strings or not. There is some confusionabout what exactly it does, and how it interacts with the "ACP". I'd be worried that if you don't put the endingson, perhaps it can change under your feet. (I suspect that part of the discussion on that other thread took somewrong turns, based on citext_utf8 results that were actually probably misleading.) 2. If we do decide to put the encodingsuffixes on, which encoding should we be suggesting? It appears from anecdotal reports that most PostgreSQL-on-Windowsusers are stuck in the past, using the old pre-UTF-8 language-specific encodings. Should we be encouraging UTF-8 use by default, if we can? Maybe that is a separatequestion. Then there is the practical question of what to do with an already-broken system. One idea would be to introduce a "localeremapping" file, pgdata/pg_locale.map where you can write things like "Turkish_Turkey.1254"="Turkish_Türkiye.1254". I think there might be a GUI tool that allows you to duplicate, rename, etc locales in Windows, so you can re-create theold name. I believe that is how some people have fixed their broken databases. I don't know if there is a good reference/blog/articleon that, that we should be pointing people towards if they show up with broken systems. Patches, testing, research are welcome! Even though I put forward that BCP47 idea, it was based on reading the manual, so the "unresolved" questions may in fact be very easy to resolve by people who actually use/know Windows. Even if I had been gung-ho aboutcommitting that in 16 without feedback from Windows users, it would have been too late to help Turkish users with existingdatabases. [1] https://learn.microsoft.com/en-us/globalization/locale/standard-locale-names [2] https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160
On Tue, May 14, 2024 at 10:27 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Thanks for your reply. But I'm not sure if there is any miscommunication. Let me make it clear again. (Sorry, it seems our emails crossed.) > I’m a program manager in the Windows App Assure ISV Outreach Team at Microsoft. We work with Microsoft’s test organizationto notify developers when issues have been identified in their applications. The issue I reported in this mailis an issue we found in our testing and I believe it also impact a lot of end users. Like you mentioned previously, itis caused by recent operating system upgrade. Thanks for doing that, and yes, it affects a lot of users, and this is not the first time. It is still possible for it to be the last... > The solution I shared in my last email was suggested our engineers. It would be great is you could use _wsetlocale API.this is what the recommended way as par Microsoft documentation as well. (Reference: setlocale wsetlocale: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170 I don't understand. Can you explain why _wsetlocale() is better than setlocale()? They behave identically except one takes wide characters, which doesn't seem to solve any problem we have. > If you don’t want to change the API to update their mapping table with Turkish_Tur in case of Turkish_Türkiye as you havedone for others locale with same issue as well in your code (reference PostgreSQL Source Code: src/port/win32setlocale.cSource File: https://doxygen.postgresql.org/win32setlocale_8c_source.html, Line number 66-67). Yeah, OK we could put more kludges into win32setlocale.c. I don't mind committing a patch like that if it addresses the issue. I am not in a position to confirm that myself... what we need is someone who works with Windows to write the patch and test it across that upgrade. Longer term I'm looking for something better than that though, because it doesn't address the root cause (need for stable identifiers), and will only ever allow us to fix problems with the old unstable names *after* users complain that their database is dead, 3-6 months after in fact due to release cycles. I think a dynamic mapping file might be better? (Maybe win32locale.c should be able to read that kludge table from a file that you can give it with an environment variable, or something like that?)
Thomas Munro <thomas.munro@gmail.com> writes: > Longer term I'm looking for something better than that though, because > it doesn't address the root cause (need for stable identifiers), and > will only ever allow us to fix problems with the old unstable names > *after* users complain that their database is dead, 3-6 months after > in fact due to release cycles. I think a dynamic mapping file might > be better? (Maybe win32locale.c should be able to read that kludge > table from a file that you can give it with an environment variable, > or something like that?) +1 for the long-term solution being more-stable locale identifiers. However, we should try to build something that will let users get out of these situations with the existing identifiers, so I like your idea of a plain-text mapping file for Windows locale names. I don't think an environment variable is necessary; just define a fixed name "$PGDATA/locale_map.txt" or such. If that file exists, just read it and map the pg_database field values with it. Maybe this shouldn't even be Windows-specific? Are there any cases where it'd save people's bacon on other platforms? regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes:
> Longer term I'm looking for something better than that though, because
> it doesn't address the root cause (need for stable identifiers), and
> will only ever allow us to fix problems with the old unstable names
> *after* users complain that their database is dead, 3-6 months after
> in fact due to release cycles. I think a dynamic mapping file might
> be better? (Maybe win32locale.c should be able to read that kludge
> table from a file that you can give it with an environment variable,
> or something like that?)
+1 for the long-term solution being more-stable locale identifiers.
However, we should try to build something that will let users get
out of these situations with the existing identifiers, so I like
your idea of a plain-text mapping file for Windows locale names.
I don't think an environment variable is necessary; just define
a fixed name "$PGDATA/locale_map.txt" or such. If that file
exists, just read it and map the pg_database field values with it.
Maybe this shouldn't even be Windows-specific? Are there any
cases where it'd save people's bacon on other platforms?
regards, tom lane
John McKown <john.archie.mckown@gmail.com> writes: > Just curious. Why a plain text file rather than a system table? Because you'd have no way to update such a table, if you can't start the database or connect to it. So that approach isn't suitable for people whose database has been broken by one of these system updates. (This is largely the same reason why, eg, postgresql.conf isn't a table.) regards, tom lane
On Tue, May 14, 2024 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > +1 for the long-term solution being more-stable locale identifiers. > However, we should try to build something that will let users get > out of these situations with the existing identifiers, so I like > your idea of a plain-text mapping file for Windows locale names. > I don't think an environment variable is necessary; just define > a fixed name "$PGDATA/locale_map.txt" or such. If that file > exists, just read it and map the pg_database field values with it. OK, I tried that, first draft attached (with my standard proviso that I don't do Windows, I just know that this passes CI and that the code works the way I intended on my local Unix system if extracted into a little harness). With this, you could in theory create a file PGDATA/win32setlocale.map containing: c Turkish_Turkey.1254=Turkish_Türkiye.1254 ... or perhaps more likely: c Turkish_Turkey.1254=tr-TR.1254 I also absorbed the pre-existing kludge table into the new system by default (though they got a bit shorter 'cause I invented some wildcards). Some problems came up while wondering how to fit Türkiye into the defaults, and how to back-patch: 1. In the back-branches, we claim to support ancient Windows releases as far back as "Windows 2000 SP4" (!), which obviously aren't getting the Windows updates, so I guess "Turkish_Türkiye.1254" will fail there and generally before Windows 10. And even if you exclude the extremities of our support window somehow (how?), modern systems might not have applied the update yet (IIUC they *have* to at some point under the new world order, so there is a defined window of version skew these days). 2. It's generally a terrible idea to be using "ü" in a locale name. FWIW I assume setlocale() actually accepts and returns names encoded in the current ACP ("active codepage", system-wide changeable setting that controls char↔wchar_t conversion in system APIs), so the encoding of that file (and the built-in default table) would need to match that to work, as coded. Perhaps it would be possible to make the mapping file UTF-8 and transform that to ACP! But it feels a bit too loopy for me, and on the PostgreSQL side it is undefined/illegal whatever you choose in PostgreSQL due to being accessed from different databases which are using potentially different encodings that are only required to be a superset of ASCII. Avoid. 3. Therefore you'd probably want to prefer "tr-TR.1254" as the replacement string. But what is the oldest Windows release that can understand a BCP47 code like that? 4. Conversely, on modern systems, I'm still not entirely sure that "tr-TR.1254" is exactly the same thing as "Turkish-XXX.1254" and that it's OK to put ".1254" on the end like that. Is it, and is it? I don't mean just "does it mean Turkish?", I mean "does it give exactly the same answer for every conceivable pair of strings when compared with strcoll_l(), and likewise for the ctype-based functions like towlower() et al". If the answers are not in our favour, I guess we could leave the default behaviour unchanged, and let people set up a text file as shown above to fix their database if they want, but that's also not very nice and kinda weird (helping hypothetical users of museum-grade systems by leaving real users' systems broken). If the answer to 4 is yes, yes then we could also push ahead with the plan to make initdb pick BCP47 names by default in PG18 (or even 17). > Maybe this shouldn't even be Windows-specific? Are there any > cases where it'd save people's bacon on other platforms? Good question. Sometimes ISO code go away or countries split etc, so it's no like POSIX locale names are set in stone under all circumstances. But on Unixen it's all just files in practice, you can always just symlink them, move them around, compile them yourself from sources, etc, if you really have to, so I think I'd rather contain the crazy in win32*.c.
Attachment
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Thanks for your questions, Thomas and Tom. + @Vishwa to help with technical questions. -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Monday, May 13, 2024 11:38 PM To: Tom Lane <tgl@sss.pgh.pa.us> Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; pgsql-bugs@lists.postgresql.org Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Tue, May 14, 2024 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > +1 for the long-term solution being more-stable locale identifiers. > However, we should try to build something that will let users get out > of these situations with the existing identifiers, so I like your idea > of a plain-text mapping file for Windows locale names. > I don't think an environment variable is necessary; just define a > fixed name "$PGDATA/locale_map.txt" or such. If that file exists, > just read it and map the pg_database field values with it. OK, I tried that, first draft attached (with my standard proviso that I don't do Windows, I just know that this passes CIand that the code works the way I intended on my local Unix system if extracted into a little harness). With this, youcould in theory create a file PGDATA/win32setlocale.map containing: c Turkish_Turkey.1254=Turkish_Türkiye.1254 ... or perhaps more likely: c Turkish_Turkey.1254=tr-TR.1254 I also absorbed the pre-existing kludge table into the new system by default (though they got a bit shorter 'cause I inventedsome wildcards). Some problems came up while wondering how to fit Türkiye into the defaults, and how to back-patch: 1. In the back-branches, we claim to support ancient Windows releases as far back as "Windows 2000 SP4" (!), which obviouslyaren't getting the Windows updates, so I guess "Turkish_Türkiye.1254" will fail there and generally before Windows10. And even if you exclude the extremities of our support window somehow (how?), modern systems might not have appliedthe update yet (IIUC they *have* to at some point under the new world order, so there is a defined window of versionskew these days). 2. It's generally a terrible idea to be using "ü" in a locale name. FWIW I assume setlocale() actually accepts and returns names encoded in the current ACP ("active codepage", system-wide changeablesetting that controls char↔wchar_t conversion in system APIs), so the encoding of that file (and the built-in defaulttable) would need to match that to work, as coded. Perhaps it would be possible to make the mapping file UTF-8 andtransform that to ACP! But it feels a bit too loopy for me, and on the PostgreSQL side it is undefined/illegal whateveryou choose in PostgreSQL due to being accessed from different databases which are using potentially different encodingsthat are only required to be a superset of ASCII. Avoid. 3. Therefore you'd probably want to prefer "tr-TR.1254" as the replacement string. But what is the oldest Windows releasethat can understand a BCP47 code like that? 4. Conversely, on modern systems, I'm still not entirely sure that "tr-TR.1254" is exactly the same thing as "Turkish-XXX.1254"and that it's OK to put ".1254" on the end like that. Is it, and is it? I don't mean just "does it meanTurkish?", I mean "does it give exactly the same answer for every conceivable pair of strings when compared with strcoll_l(),and likewise for the ctype-based functions like towlower() et al". If the answers are not in our favour, I guess we could leave the default behaviour unchanged, and let people set up a textfile as shown above to fix their database if they want, but that's also not very nice and kinda weird (helping hypotheticalusers of museum-grade systems by leaving real users' systems broken). If the answer to 4 is yes, yes then we could also push ahead with the plan to make initdb pick BCP47 names by default inPG18 (or even 17). > Maybe this shouldn't even be Windows-specific? Are there any cases > where it'd save people's bacon on other platforms? Good question. Sometimes ISO code go away or countries split etc, so it's no like POSIX locale names are set in stone underall circumstances. But on Unixen it's all just files in practice, you can always just symlink them, move them around,compile them yourself from sources, etc, if you really have to, so I think I'd rather contain the crazy in win32*.c.
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Thomas and Tom, Please let us know your question or concern about the bug we discuss in this mail. Thanks! Haifang -----Original Message----- From: Haifang Wang (Centific Technologies Inc) Sent: Wednesday, May 15, 2024 4:47 PM To: Thomas Munro <thomas.munro@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com> Cc: pgsql-bugs@lists.postgresql.org Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 Thanks for your questions, Thomas and Tom. + @Vishwa to help with technical questions. -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Monday, May 13, 2024 11:38 PM To: Tom Lane <tgl@sss.pgh.pa.us> Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; pgsql-bugs@lists.postgresql.org Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Tue, May 14, 2024 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > +1 for the long-term solution being more-stable locale identifiers. > However, we should try to build something that will let users get out > of these situations with the existing identifiers, so I like your idea > of a plain-text mapping file for Windows locale names. > I don't think an environment variable is necessary; just define a > fixed name "$PGDATA/locale_map.txt" or such. If that file exists, > just read it and map the pg_database field values with it. OK, I tried that, first draft attached (with my standard proviso that I don't do Windows, I just know that this passes CIand that the code works the way I intended on my local Unix system if extracted into a little harness). With this, youcould in theory create a file PGDATA/win32setlocale.map containing: c Turkish_Turkey.1254=Turkish_Türkiye.1254 ... or perhaps more likely: c Turkish_Turkey.1254=tr-TR.1254 I also absorbed the pre-existing kludge table into the new system by default (though they got a bit shorter 'cause I inventedsome wildcards). Some problems came up while wondering how to fit Türkiye into the defaults, and how to back-patch: 1. In the back-branches, we claim to support ancient Windows releases as far back as "Windows 2000 SP4" (!), which obviouslyaren't getting the Windows updates, so I guess "Turkish_Türkiye.1254" will fail there and generally before Windows10. And even if you exclude the extremities of our support window somehow (how?), modern systems might not have appliedthe update yet (IIUC they *have* to at some point under the new world order, so there is a defined window of versionskew these days). 2. It's generally a terrible idea to be using "ü" in a locale name. FWIW I assume setlocale() actually accepts and returns names encoded in the current ACP ("active codepage", system-wide changeablesetting that controls char↔wchar_t conversion in system APIs), so the encoding of that file (and the built-in defaulttable) would need to match that to work, as coded. Perhaps it would be possible to make the mapping file UTF-8 andtransform that to ACP! But it feels a bit too loopy for me, and on the PostgreSQL side it is undefined/illegal whateveryou choose in PostgreSQL due to being accessed from different databases which are using potentially different encodingsthat are only required to be a superset of ASCII. Avoid. 3. Therefore you'd probably want to prefer "tr-TR.1254" as the replacement string. But what is the oldest Windows releasethat can understand a BCP47 code like that? 4. Conversely, on modern systems, I'm still not entirely sure that "tr-TR.1254" is exactly the same thing as "Turkish-XXX.1254"and that it's OK to put ".1254" on the end like that. Is it, and is it? I don't mean just "does it meanTurkish?", I mean "does it give exactly the same answer for every conceivable pair of strings when compared with strcoll_l(),and likewise for the ctype-based functions like towlower() et al". If the answers are not in our favour, I guess we could leave the default behaviour unchanged, and let people set up a textfile as shown above to fix their database if they want, but that's also not very nice and kinda weird (helping hypotheticalusers of museum-grade systems by leaving real users' systems broken). If the answer to 4 is yes, yes then we could also push ahead with the plan to make initdb pick BCP47 names by default inPG18 (or even 17). > Maybe this shouldn't even be Windows-specific? Are there any cases > where it'd save people's bacon on other platforms? Good question. Sometimes ISO code go away or countries split etc, so it's no like POSIX locale names are set in stone underall circumstances. But on Unixen it's all just files in practice, you can always just symlink them, move them around,compile them yourself from sources, etc, if you really have to, so I think I'd rather contain the crazy in win32*.c.
On Sat, May 18, 2024 at 10:25 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Please let us know your question or concern about the bug we discuss in this mail. Thanks for your help with this! Did I miss an email in between? Here are some unanswered questions that seem to be stopping progress: 1. What is the oldest Windows release that can understand the "new" BCP47 locale names, like "tr-TR" or "tr-TR.1452"? 2. If we translate to BCP47 automatically, should we put the ".1452" on the end? What does it mean exactly? What does it mean if you don't put it there? 3. Do the new BCP47 locale names give *exactly* the same results for strcoll() and tolower() etc, as the old "Turkish*" style names? With answers to those questions we might be able to ship some nice built-in translations to get users out of this jam. If there are issues on those points, we might have to face some questions about what the encoding is of the "Turkish_Türkiye.1254" string itself, which is tricky for us for technical reasons, among other problems... not sure.
Thomas Munro <thomas.munro@gmail.com> writes: > With answers to those questions we might be able to ship some nice > built-in translations to get users out of this jam. If there are > issues on those points, we might have to face some questions about > what the encoding is of the "Turkish_Türkiye.1254" string itself, > which is tricky for us for technical reasons, among other problems... > not sure. TBH, my idea of how this should go was to *not* ship any built-in translations, or indeed any translation file at all (so I didn't like your moving some existing hacks into that file). If we approach it like that, then individual users who hit this problem are responsible for creating their own translation file, which will automatically use whatever is the locally-appropriate encoding. Sure, it's less transparent for affected users, but it will work for them which other approaches might not; and evidence so far is that there's not a huge number of affected users. regards, tom lane
On Sat, May 18, 2024 at 5:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > With answers to those questions we might be able to ship some nice > > built-in translations to get users out of this jam. If there are > > issues on those points, we might have to face some questions about > > what the encoding is of the "Turkish_Türkiye.1254" string itself, > > which is tricky for us for technical reasons, among other problems... > > not sure. > > TBH, my idea of how this should go was to *not* ship any built-in > translations, or indeed any translation file at all (so I didn't > like your moving some existing hacks into that file). If we > approach it like that, then individual users who hit this problem > are responsible for creating their own translation file, which > will automatically use whatever is the locally-appropriate > encoding. Sure, it's less transparent for affected users, but > it will work for them which other approaches might not; and > evidence so far is that there's not a huge number of affected > users. Hmm, yeah I guess we could just ship a patch like what I posted already, and let users figure out what works. I would still like to know the answer to those questions so we can offer good advice. If the answers are all yes then I think we can say "the file is in encoded in ASCII; use wildcards to deal with any legacy non-ASCII locale names on the left, and use BCP47 names on the right" and begin expunging the bad old names from our universe.
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Thomas, What questions do you have? Could you please list them clearly so that Vishwa could help to answer? NOTE: this email is plain text format, not easy to read. And also the previous email content usually is not included in thelatest email, we only could see the most latest one. Thanks! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Friday, May 17, 2024 10:57 PM To: Tom Lane <tgl@sss.pgh.pa.us> Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Sat, May 18, 2024 at 5:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > With answers to those questions we might be able to ship some nice > > built-in translations to get users out of this jam. If there are > > issues on those points, we might have to face some questions about > > what the encoding is of the "Turkish_Türkiye.1254" string itself, > > which is tricky for us for technical reasons, among other problems... > > not sure. > > TBH, my idea of how this should go was to *not* ship any built-in > translations, or indeed any translation file at all (so I didn't like > your moving some existing hacks into that file). If we approach it > like that, then individual users who hit this problem are responsible > for creating their own translation file, which will automatically use > whatever is the locally-appropriate encoding. Sure, it's less > transparent for affected users, but it will work for them which other > approaches might not; and evidence so far is that there's not a huge > number of affected users. Hmm, yeah I guess we could just ship a patch like what I posted already, and let users figure out what works. I would stilllike to know the answer to those questions so we can offer good advice. If the answers are all yes then I think wecan say "the file is in encoded in ASCII; use wildcards to deal with any legacy non-ASCII locale names on the left, anduse BCP47 names on the right" and begin expunging the bad old names from our universe.
On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > What questions do you have? Could you please list them clearly so that Vishwa could help to answer? I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive: https://www.postgresql.org/message-id/flat/PH8PR21MB3902F334A3174C54058F792CE5182%40PH8PR21MB3902.namprd21.prod.outlook.com But let me ask the questions again, with some motivation/reason I want to know in parentheses: 1. What is the oldest Windows release that can understand the "new" BCP47 locale names, like "tr-TR" or "tr-TR.1452"? (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this. However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye. [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...]) 2. If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end? What does it mean exactly? What does it mean if you don't put it there? (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP". What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect? For example if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens? I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.) 3. Do the new BCP47 locale names give *exactly* the same results for strcoll() and tolower() etc, as the old "Turkish*" style names? (In other words, is it *exactly the same code and driving data*, just using different labels? Or is it a new locale implementation that could differ arbitrarily in behaviour? If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Vishwa, Could you please help with the questions below? Thanks! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Monday, May 20, 2024 1:57 PM To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > What questions do you have? Could you please list them clearly so that Vishwa could help to answer? I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our projectemail archive: https://www.postgresql.org/message-id/flat/PH8PR21MB3902F334A3174C54058F792CE5182%40PH8PR21MB3902.namprd21.prod.outlook.com But let me ask the questions again, with some motivation/reason I want to know in parentheses: 1. What is the oldest Windows release that can understand the "new" BCP47 locale names, like "tr-TR" or "tr-TR.1452"? (Some PostgreSQL versions, for example PostgreSQL 12, are expected torun on old versions of Windows from long before Windows 10, so we might have to consider this. However, if we go withTom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this questionbecomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the updatethat renamed Turkey to Türkiye. [And in reality, I hope/expect that no one really does run old out-of-support OSes,because that's crazy, but I'm not allowed to assume...]) 2. If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end? What does itmean exactly? What does it mean if you don't put it there? (I could guess that if you don't put it on, the encoding in "char"-based functionsis the "ACP". What I really want to know is, can it be different from the "ACP", and if it is, which functionsdoes it affect? For example if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with"en-US.UTF-8", what happens? I am sure this is a simple question but we are not Windows programmers, you are the firstperson to show up offering to investigate, and I personally found the docs a bit light on the topic.) 3. Do the new BCP47 locale names give *exactly* the same results for strcoll() and tolower() etc, as the old "Turkish*" style names? (In other words, is it *exactly the same code and drivingdata*, just using different labels? Or is it a new locale implementation that could differ arbitrarily in behaviour? If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but ifnot, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know aboutthat, so we can adjust our advice to users.)
Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Vishwa
Sent: Tuesday, May 21, 2024 5:02 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Could you please help with the questions below?
Thanks!
Haifang
-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?
I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive:
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPH8PR21MB3902F334A3174C54058F792CE5182%2540PH8PR21MB3902.namprd21.prod.outlook.com&data=05%7C02%7CVishwa.Deepak%40microsoft.com%7Cd5972d3ecb084308e18d08dc79252cea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638518447836539330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ELfVuQ62kphneTt0ES5v2iPOhlXH2OtxU0EGBre6BG8%3D&reserved=0
But let me ask the questions again, with some motivation/reason I want to know in parentheses:
1. What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"? (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this. However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye. [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])
Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.
2. If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end? What does it mean exactly?
What does it mean if you don't put it there? (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP". What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect? For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens? I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)
The previous topic (How the Resource Management System matches and chooses resources) looks at qualifier-matching in general. This topic focuses on language-tag-matching in more detail. learn.microsoft.com |
3. Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names? (In other words, is it *exactly the same code and driving data*, just using different labels? Or is it a new locale implementation that could differ arbitrarily in behaviour? If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)
What does it mean if you don't put it there? (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP". What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect? For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens? I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)
strcoll() and tolower() etc, as the old "Turkish*" style names? (In other words, is it *exactly the same code and driving data*, just using different labels? Or is it a new locale implementation that could differ arbitrarily in behaviour? If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)
Sent: Tuesday, May 21, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Rahul Pandey <pandeyrah@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Vishwa
Sent: Tuesday, May 21, 2024 5:02 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Could you please help with the questions below?
Thanks!
Haifang
-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?
I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive:
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPH8PR21MB3902F334A3174C54058F792CE5182%2540PH8PR21MB3902.namprd21.prod.outlook.com&data=05%7C02%7CVishwa.Deepak%40microsoft.com%7Cd5972d3ecb084308e18d08dc79252cea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638518447836539330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ELfVuQ62kphneTt0ES5v2iPOhlXH2OtxU0EGBre6BG8%3D&reserved=0
But let me ask the questions again, with some motivation/reason I want to know in parentheses:
1. What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"? (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this. However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye. [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])
Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.
2. If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end? What does it mean exactly?
What does it mean if you don't put it there? (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP". What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect? For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens? I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)
The previous topic (How the Resource Management System matches and chooses resources) looks at qualifier-matching in general. This topic focuses on language-tag-matching in more detail. learn.microsoft.com |
3. Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names? (In other words, is it *exactly the same code and driving data*, just using different labels? Or is it a new locale implementation that could differ arbitrarily in behaviour? If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Thanks Vishwa for all the clarification below.
Hi @Rahul and everyone,
Is there anything else not clear? Is there any solution for the issue?
Thanks!
Haifang
From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, May 22, 2024 4:37 AM
To: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Thanks, Vishwa, for tagging me.
Adding my two cents,
Hello Thomas,
1. What is the oldest Windows release that can understand the "new" BCP47 locale names, like "tr-TR" or "tr-TR.1452"? (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this. However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye. [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])
Vishwa: Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.
Rahul: Locale names based on BCP 47 were first introduced in Windows Vista timeframe, so using them should be pretty safe for most modern and older versions (unless its XP or earlier).
2. If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end? What does it mean exactly?
What does it mean if you don't put it there? (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP". What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect? For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens? I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)
Vishwa: It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details
https://learn.microsoft.com/en-us/windows/uwp/app-resources/how-rms-matches-lang-tags
Rahul: For the first part, your assumption is correct. The behaviour for "tr-TR" and "tr-TR.ACP" would be same and it would try to use the default ANSI Code Page for Turkish (which happens to be 1254). Using any other code page (for example"tr-TR.1252") would use that codepage (1252: English). For the second part (example), I am not sure if I understand the question completely, but mixing the encoding is almost never a good idea and could lead to mojibaked strings in the worst case to no change (if strings only contain ASCII chars) in the best-case scenario.
3. Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names? (In other words, is it *exactly the same code and driving data*, just using different labels? Or is it a new locale implementation that could differ arbitrarily in behaviour? If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)
Vishwa: Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü
Rahul: I agree with Vishwa. the locale is the same, just the name of the country in English is changed. Rest all data is the same.
Thanks,
Rahul
From: Vishwa Deepak <Vishwa.Deepak@microsoft.com>
Sent: Tuesday, May 21, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Rahul Pandey <pandeyrah@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
I am adding experts in this thread to address these queries. @Amy Wishnousky @Shawn Steele @Rahul Pandey
Adding my inline comment to best of my knowledge.
Thanks & Regards
Vishwa
From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, May 21, 2024 5:02 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Vishwa,
Could you please help with the questions below?
Thanks!
Haifang
-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?
I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive:
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPH8PR21MB3902F334A3174C54058F792CE5182%2540PH8PR21MB3902.namprd21.prod.outlook.com&data=05%7C02%7CVishwa.Deepak%40microsoft.com%7Cd5972d3ecb084308e18d08dc79252cea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638518447836539330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ELfVuQ62kphneTt0ES5v2iPOhlXH2OtxU0EGBre6BG8%3D&reserved=0
But let me ask the questions again, with some motivation/reason I want to know in parentheses:
1. What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"? (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this. However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye. [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])
Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.
2. If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end? What does it mean exactly?
What does it mean if you don't put it there? (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP". What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect? For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens? I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)
It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details
The previous topic (How the Resource Management System matches and chooses resources) looks at qualifier-matching in general. This topic focuses on language-tag-matching in more detail. learn.microsoft.com |
3. Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names? (In other words, is it *exactly the same code and driving data*, just using different labels? Or is it a new locale implementation that could differ arbitrarily in behaviour? If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)
Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü
Please do proper due diligence at your end before proceeding with any kind of mapping.
Regards
Vishwa
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Thomas and Team,
Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?
Thanks!
Haifang
From: Haifang Wang (Centific Technologies Inc)
Sent: Tuesday, May 28, 2024 11:21 AM
To: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Thanks Vishwa for all the clarification below.
Hi @Rahul and everyone,
Is there anything else not clear? Is there any solution for the issue?
Thanks!
Haifang
From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, May 22, 2024 4:37 AM
To: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Thanks, Vishwa, for tagging me.
Adding my two cents,
Hello Thomas,
1. What is the oldest Windows release that can understand the "new" BCP47 locale names, like "tr-TR" or "tr-TR.1452"? (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this. However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye. [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])
Vishwa: Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.
Rahul: Locale names based on BCP 47 were first introduced in Windows Vista timeframe, so using them should be pretty safe for most modern and older versions (unless its XP or earlier).
2. If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end? What does it mean exactly?
What does it mean if you don't put it there? (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP". What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect? For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens? I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)
Vishwa: It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details
https://learn.microsoft.com/en-us/windows/uwp/app-resources/how-rms-matches-lang-tags
Rahul: For the first part, your assumption is correct. The behaviour for "tr-TR" and "tr-TR.ACP" would be same and it would try to use the default ANSI Code Page for Turkish (which happens to be 1254). Using any other code page (for example"tr-TR.1252") would use that codepage (1252: English). For the second part (example), I am not sure if I understand the question completely, but mixing the encoding is almost never a good idea and could lead to mojibaked strings in the worst case to no change (if strings only contain ASCII chars) in the best-case scenario.
3. Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names? (In other words, is it *exactly the same code and driving data*, just using different labels? Or is it a new locale implementation that could differ arbitrarily in behaviour? If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)
Vishwa: Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü
Rahul: I agree with Vishwa. the locale is the same, just the name of the country in English is changed. Rest all data is the same.
Thanks,
Rahul
From: Vishwa Deepak <Vishwa.Deepak@microsoft.com>
Sent: Tuesday, May 21, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Rahul Pandey <pandeyrah@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
I am adding experts in this thread to address these queries. @Amy Wishnousky @Shawn Steele @Rahul Pandey
Adding my inline comment to best of my knowledge.
Thanks & Regards
Vishwa
From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, May 21, 2024 5:02 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Vishwa,
Could you please help with the questions below?
Thanks!
Haifang
-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?
I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive:
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPH8PR21MB3902F334A3174C54058F792CE5182%2540PH8PR21MB3902.namprd21.prod.outlook.com&data=05%7C02%7CVishwa.Deepak%40microsoft.com%7Cd5972d3ecb084308e18d08dc79252cea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638518447836539330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ELfVuQ62kphneTt0ES5v2iPOhlXH2OtxU0EGBre6BG8%3D&reserved=0
But let me ask the questions again, with some motivation/reason I want to know in parentheses:
1. What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"? (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this. However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye. [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])
Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.
2. If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end? What does it mean exactly?
What does it mean if you don't put it there? (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP". What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect? For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens? I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)
It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details
The previous topic (How the Resource Management System matches and chooses resources) looks at qualifier-matching in general. This topic focuses on language-tag-matching in more detail. learn.microsoft.com |
3. Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names? (In other words, is it *exactly the same code and driving data*, just using different labels? Or is it a new locale implementation that could differ arbitrarily in behaviour? If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)
Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü
Please do proper due diligence at your end before proceeding with any kind of mapping.
Regards
Vishwa
On Tue, Jun 4, 2024 at 1:00 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Didyou get chance to do investigations? Any other questions? Hi, Thanks for all your feedback! Sorry for my late replies, I was travelling and at a conference last week (at which one of the topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic than this thread...) The answers are all useful, thank you. There was just one thing I wanted to clarify. Vishwa said: "Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü" That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?', which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?'. It sounds like the answer is probably yes, but the mention of "u" vs "ü" implied a narrower answer than I was looking for... It's important, because we're proposing to translate to "tr-TR.1254". Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build option that can also do locale name renaming on Unix, so that I have something that I could test myself and push for the next release of PostgreSQL which will be in October. One open question is whether we should ship a translation file ourselves. I have heard two opinions: Tom Lane proposed in this thread that there should be no file initially, we let the user figure out what to put in there. Magnus Hagander proposed at the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it over time, and it should be installed not in pgdata but rather in the install directory. I am not personally able or willing to maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a group such as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, might like to own that job.
Thomas Munro <thomas.munro@gmail.com> writes: > One open question is whether we should ship a translation file > ourselves. I have heard two opinions: Tom Lane proposed in this > thread that there should be no file initially, we let the user figure > out what to put in there. Magnus Hagander proposed at the pgconf.dev > conference last week that we should ideally ship a complete > translation file and maintain it over time, and it should be installed > not in pgdata but rather in the install directory. FWIW, I'm kind of down on the latter approach, because I don't think it'll move the needle very far. Based on track record so far, there's no chance that we will be aware of a Microsoft locale renaming before it starts breaking users' databases. Therefore, "edit the translation file" is going to have to be a documented process in any case, because affected users are not going to want to wait around for our next release for a fix. Also, if people do have to do that, it doesn't seem like a great idea to tell them to modify an installed file rather than a cluster-local configuration file. What if they do a minor version update but the minor version doesn't (yet) contain the fix? Admittedly, the installed-file approach could make it more transparent for people who'd done a PG minor update before the relevant Windows update. I'm not sure how large that set of people will be, though. regards, tom lane
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer. Regards! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Monday, June 3, 2024 2:22 PM To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>;Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com> Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Tue, Jun 4, 2024 at 1:00 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Didyou get chance to do investigations? Any other questions? Hi, Thanks for all your feedback! Sorry for my late replies, I was travelling and at a conference last week (at which one ofthe topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic thanthis thread...) The answers are all useful, thank you. There was just one thing I wanted to clarify. Vishwa said: "Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü" That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?',which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254"and "tr-TR.1254" all behave identically?'. It sounds like the answer is probably yes, but the mentionof "u" vs "ü" implied a narrower answer than I was looking for... It's important, because we're proposing to translateto "tr-TR.1254". Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build optionthat can also do locale name renaming on Unix, so that I have something that I could test myself and push for the nextrelease of PostgreSQL which will be in October. One open question is whether we should ship a translation file ourselves. I have heard two opinions: Tom Lane proposed inthis thread that there should be no file initially, we let the user figure out what to put in there. Magnus Hagander proposedat the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it overtime, and it should be installed not in pgdata but rather in the install directory. I am not personally able or willingto maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a groupsuch as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, mightlike to own that job.
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Thanks, Tom. Is there any suggestion on how to track all the conversations in one thread? Seems like all our previous discussionare in different mail thread. It is not easy to track. 😊 Regards! Haifang -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Monday, June 3, 2024 3:55 PM To: Thomas Munro <thomas.munro@gmail.com> Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak<Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>; MagnusHagander <magnus@hagander.net> Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 Thomas Munro <thomas.munro@gmail.com> writes: > One open question is whether we should ship a translation file > ourselves. I have heard two opinions: Tom Lane proposed in this > thread that there should be no file initially, we let the user figure > out what to put in there. Magnus Hagander proposed at the pgconf.dev > conference last week that we should ideally ship a complete > translation file and maintain it over time, and it should be installed > not in pgdata but rather in the install directory. FWIW, I'm kind of down on the latter approach, because I don't think it'll move the needle very far. Based on track recordso far, there's no chance that we will be aware of a Microsoft locale renaming before it starts breaking users' databases. Therefore, "edit the translation file" is going to have to be a documented process in any case, because affectedusers are not going to want to wait around for our next release for a fix. Also, if people do have to do that, itdoesn't seem like a great idea to tell them to modify an installed file rather than a cluster-local configuration file. What if they do a minor version update but the minor version doesn't (yet) contain the fix? Admittedly, the installed-file approach could make it more transparent for people who'd done a PG minor update before therelevant Windows update. I'm not sure how large that set of people will be, though. regards, tom lane
On Tue, Jun 4, 2024 at 12:13 PM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Thanks, Tom. Is there any suggestion on how to track all the conversations in one thread? Seems like all our previous discussionare in different mail thread. It is not easy to track. 😊 FWIW this shows up as one thread in my email client, and in the PostgreSQL archives website[1]. I don't know much about email and which RFCs or conventions are at work here, but apparently different clients are using different techniques to identify threads. I assume it could be done with thread headers (as seen in this thread) or reply-to chains or fuzzy recognition of subject etc. Given that Outlook seems to ignore the "> " inline response quoting convention used by the rest of the internet, it wouldn't surprise me to hear that it also doesn't follow the conventions for recognising threads either, being a sort of related topic. [1] https://www.postgresql.org/message-id/flat/PH8PR21MB3902F334A3174C54058F792CE5182%40PH8PR21MB3902.namprd21.prod.outlook.com
Sent: Tuesday, June 4, 2024 5:39 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Regards!
Haifang
-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On Tue, Jun 4, 2024 at 1:00 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?
Hi,
Thanks for all your feedback! Sorry for my late replies, I was travelling and at a conference last week (at which one of the topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic than this thread...)
The answers are all useful, thank you. There was just one thing I wanted to clarify. Vishwa said:
"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"
That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?', which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?'. It sounds like the answer is probably yes, but the mention of "u" vs "ü" implied a narrower answer than I was looking for... It's important, because we're proposing to translate to "tr-TR.1254".
Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build option that can also do locale name renaming on Unix, so that I have something that I could test myself and push for the next release of PostgreSQL which will be in October.
One open question is whether we should ship a translation file ourselves. I have heard two opinions: Tom Lane proposed in this thread that there should be no file initially, we let the user figure out what to put in there. Magnus Hagander proposed at the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it over time, and it should be installed not in pgdata but rather in the install directory. I am not personally able or willing to maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a group such as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, might like to own that job.
On Wed, Jun 5, 2024 at 9:10 PM Rahul Pandey <pandeyrah@microsoft.com> wrote: > "Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?" > You are correct, they all behave in exactly the same way. > I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the implementationmore future proof. Thanks for confirming. More soon.
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Thomas and Tom,
Just would like to follow up you about the bug. Is there any update? Any other information needed?
Thanks!
Haifang
From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, June 5, 2024 2:11 AM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>; 🎯dev <targetdev@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi,
"Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?"
You are correct, they all behave in exactly the same way.
I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the implementation more future proof.
As for translation files, apologies I neither have a preference nor expertise to comment on this.
From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, June 4, 2024 5:39 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer.
Regards!
Haifang
-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On Tue, Jun 4, 2024 at 1:00 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?
Hi,
Thanks for all your feedback! Sorry for my late replies, I was travelling and at a conference last week (at which one of the topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic than this thread...)
The answers are all useful, thank you. There was just one thing I wanted to clarify. Vishwa said:
"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"
That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?', which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?'. It sounds like the answer is probably yes, but the mention of "u" vs "ü" implied a narrower answer than I was looking for... It's important, because we're proposing to translate to "tr-TR.1254".
Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build option that can also do locale name renaming on Unix, so that I have something that I could test myself and push for the next release of PostgreSQL which will be in October.
One open question is whether we should ship a translation file ourselves. I have heard two opinions: Tom Lane proposed in this thread that there should be no file initially, we let the user figure out what to put in there. Magnus Hagander proposed at the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it over time, and it should be installed not in pgdata but rather in the install directory. I am not personally able or willing to maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a group such as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, might like to own that job.
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi all,
Still need to double check with you about this bug. How is it going? Any other information needed?
Thanks!
Haifang
From: Haifang Wang (Centific Technologies Inc)
Sent: Thursday, June 13, 2024 11:47 AM
To: Rahul Pandey <pandeyrah@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>; 🎯dev <targetdev@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Thomas and Tom,
Just would like to follow up you about the bug. Is there any update? Any other information needed?
Thanks!
Haifang
From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, June 5, 2024 2:11 AM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>; 🎯dev <targetdev@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi,
"Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?"
You are correct, they all behave in exactly the same way.
I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the implementation more future proof.
As for translation files, apologies I neither have a preference nor expertise to comment on this.
From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, June 4, 2024 5:39 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer.
Regards!
Haifang
-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On Tue, Jun 4, 2024 at 1:00 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?
Hi,
Thanks for all your feedback! Sorry for my late replies, I was travelling and at a conference last week (at which one of the topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic than this thread...)
The answers are all useful, thank you. There was just one thing I wanted to clarify. Vishwa said:
"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"
That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?', which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?'. It sounds like the answer is probably yes, but the mention of "u" vs "ü" implied a narrower answer than I was looking for... It's important, because we're proposing to translate to "tr-TR.1254".
Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build option that can also do locale name renaming on Unix, so that I have something that I could test myself and push for the next release of PostgreSQL which will be in October.
One open question is whether we should ship a translation file ourselves. I have heard two opinions: Tom Lane proposed in this thread that there should be no file initially, we let the user figure out what to put in there. Magnus Hagander proposed at the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it over time, and it should be installed not in pgdata but rather in the install directory. I am not personally able or willing to maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a group such as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, might like to own that job.
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi all,
Still need to double check with you about this bug. How is it going? Any other information needed?
Thanks!
Haifang
From: Haifang Wang (Centific Technologies Inc)
Sent: Monday, June 17, 2024 2:06 PM
To: Rahul Pandey <pandeyrah@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>; 🎯dev <targetdev@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi all,
Still need to double check with you about this bug. How is it going? Any other information needed?
Thanks!
Haifang
From: Haifang Wang (Centific Technologies Inc)
Sent: Thursday, June 13, 2024 11:47 AM
To: Rahul Pandey <pandeyrah@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>; 🎯dev <targetdev@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Thomas and Tom,
Just would like to follow up you about the bug. Is there any update? Any other information needed?
Thanks!
Haifang
From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, June 5, 2024 2:11 AM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>; 🎯dev <targetdev@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi,
"Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?"
You are correct, they all behave in exactly the same way.
I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the implementation more future proof.
As for translation files, apologies I neither have a preference nor expertise to comment on this.
From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, June 4, 2024 5:39 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer.
Regards!
Haifang
-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On Tue, Jun 4, 2024 at 1:00 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?
Hi,
Thanks for all your feedback! Sorry for my late replies, I was travelling and at a conference last week (at which one of the topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic than this thread...)
The answers are all useful, thank you. There was just one thing I wanted to clarify. Vishwa said:
"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"
That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?', which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?'. It sounds like the answer is probably yes, but the mention of "u" vs "ü" implied a narrower answer than I was looking for... It's important, because we're proposing to translate to "tr-TR.1254".
Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build option that can also do locale name renaming on Unix, so that I have something that I could test myself and push for the next release of PostgreSQL which will be in October.
One open question is whether we should ship a translation file ourselves. I have heard two opinions: Tom Lane proposed in this thread that there should be no file initially, we let the user figure out what to put in there. Magnus Hagander proposed at the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it over time, and it should be installed not in pgdata but rather in the install directory. I am not personally able or willing to maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a group such as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, might like to own that job.
On Tue, Jun 4, 2024 at 9:22 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Since no one has come forward to test the patch I wrote on Windows, I > think my next move will be to try to make a build option that can also > do locale name renaming on Unix, so that I have something that I could > test myself and push for the next release of PostgreSQL which will be > in October. Here is a such a patch. If you go into pg_config_manual.h and uncomment this line: /* #define DEBUG_SETLOCALE_MAP */ ... then Unix systems will also be able to rename locales passed to setlocale(). A map file can be provided either by putting its absolute path into the environment variable PG_SETLOCALE_MAP, or by installing it as $PREFIX/share/postgresql/setlocale.map. I couldn't immediately think of a good way to find it in the data directory. Here's an example of a line that should fix the Turkish problem (though I haven't tested that, I am not a Windows user): Turkish_T*.1254=tr-TR.1254 I added some documentation and showed that example. If you wanted to check it's working on a Unix system, you might try some lines like *.UTF-8=does_not_exist or en_US.UTF-8=fr_FR.UTF-8 and then somehow verify that it's using French. I considered adding win32setlocale.c to the list of files to build for the port libraries even on Unix, and then wrapping the contents in #ifdef, but IIUC macOS squawks if you have an empty .c after preprocessing, so I'd have to add a dummy symbol in there. Or maybe that'd be better than what I did here, namely including win32setlocale.c in chklocale.c in this case. Better ideas welcome. Adding a meson/configure switch to enable it and make the whole .c file optional seemed excessive.
Attachment
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Thanks, Thomas. Will wait for the build option to do further testing. Regards! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Tuesday, June 25, 2024 4:52 AM To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>;Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com> Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Tue, Jun 4, 2024 at 9:22 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Since no one has come forward to test the patch I wrote on Windows, I > think my next move will be to try to make a build option that can also > do locale name renaming on Unix, so that I have something that I could > test myself and push for the next release of PostgreSQL which will be > in October. Here is a such a patch. If you go into pg_config_manual.h and uncomment this line: /* #define DEBUG_SETLOCALE_MAP */ ... then Unix systems will also be able to rename locales passed to setlocale(). A map file can be provided either by puttingits absolute path into the environment variable PG_SETLOCALE_MAP, or by installing it as $PREFIX/share/postgresql/setlocale.map. I couldn't immediately think of a good way to find it in the data directory. Here's an example of a line that should fix the Turkish problem (though I haven't tested that, I am not a Windows user): Turkish_T*.1254=tr-TR.1254 I added some documentation and showed that example. If you wanted to check it's working on a Unix system, you might try some lines like *.UTF-8=does_not_exist or en_US.UTF-8=fr_FR.UTF-8and then somehow verify that it's using French. I considered adding win32setlocale.c to the list of files to build for the port libraries even on Unix, and then wrappingthe contents in #ifdef, but IIUC macOS squawks if you have an empty .c after preprocessing, so I'd have to add adummy symbol in there. Or maybe that'd be better than what I did here, namely including win32setlocale.c in chklocale.cin this case. Better ideas welcome. Adding a meson/configure switch to enable it and make the whole .c file optional seemed excessive.
Windows Application Issues | PostgreSQL | REF # 51961374
Hi Team,
I’m a program manager in the Windows App Assure ISV Outreach Team at Microsoft. We work with Microsoft’s test organization to notify developers when issues have been identified in their applications. We’re reaching out to notify you of a potential issue in one of your applications.
The issue details are below, please review it when you have a moment. Our goal is to work with you to address this issue and to understand what your expected timeline to address this issue might be. If you have any questions about the details below or have already addressed this issue in a forthcoming update, please let me know.
Account | PostgreSQL Global Development Group |
Product | PostgreSQL |
Reference # | 51961374 |
Issue | Fail to work with Find and Replace options within the application with latest version of app
Observations: Fail to work with Find and Replace options within the application with latest version of app Should be able to work with Find and Replace options successfully without any issue. |
Resource | For any questions on app development (or) submission on windows, contact Windows Dev Center. https://developer.microsoft.com/en-us/windows/support |
Thanks!
Haifang
On Sat, Jul 6, 2024 at 10:00 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Launch the PostgreSQL from the start menu. > Connect to the database. > Go to Tool menu ->click on query tool option. > Write the sql query. > Click edit option ->Try to work with find and replace options. Hi Haifang, It sounds like this might be referring to "pgadmin", a GUI tool for talking to PostgreSQL? PostgreSQL itself doesn't have a GUI, but the associated pgadmin project does, and it has its own place for bug reporting over here: https://www.pgadmin.org/support/issues/
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 51961374
Thanks a lot for pointing this out, Thomas. I will report the issue to pgadmin. Regards! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Sunday, July 7, 2024 12:17 AM To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org Subject: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 51961374 On Sat, Jul 6, 2024 at 10:00 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Launch the PostgreSQL from the start menu. > Connect to the database. > Go to Tool menu ->click on query tool option. > Write the sql query. > Click edit option ->Try to work with find and replace options. Hi Haifang, It sounds like this might be referring to "pgadmin", a GUI tool for talking to PostgreSQL? PostgreSQL itself doesn't havea GUI, but the associated pgadmin project does, and it has its own place for bug reporting over here: https://www.pgadmin.org/support/issues/
A minor release deadline is only a couple of weeks away and unfortunately we still have no confirmation from a Windows user that this technique can be used to unbreak a database cluster that actually goes though that exact upgrade sequence. So I spent today figuring out how to write a TAP test to try to convince myself of that in the abstract. No C code change from v2. Does the test show a useful enough pathway that is likely to help people in future cases at least? I learned two new things: 1. With a remapped locale, you can log into your existing databases again, but CREATE DATABASE doesn't work (see test). That must have been the case for the historical transitions that were hard-coded (Macau etc); this is no different, it's just that the mapping table is now user-changeable. In other words, this just gives an administrator a way to log into a cluster again after such a change, but then they really need to update the names in catalogs, at least for any database that might be used as a template, to get a 100% functional system. So I have added a paragraph to the documentation piece to explain that. 2. Including the encoding suffix eg "en-US.1252" doesn't actually work after all (discovered by CI). You have to use just "en-US". (I guess ".UTF-8" must be a special case.) So I have updated the documentation example.
Attachment
Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On 2024-07-21 Su 7:54 AM, Thomas Munro wrote: > A minor release deadline is only a couple of weeks away and > unfortunately we still have no confirmation from a Windows user that > this technique can be used to unbreak a database cluster that actually > goes though that exact upgrade sequence. So I spent today figuring > out how to write a TAP test to try to convince myself of that in the > abstract. No C code change from v2. Does the test show a useful > enough pathway that is likely to help people in future cases at least? > > I learned two new things: > > 1. With a remapped locale, you can log into your existing databases > again, but CREATE DATABASE doesn't work (see test). That must have > been the case for the historical transitions that were hard-coded > (Macau etc); this is no different, it's just that the mapping table is > now user-changeable. In other words, this just gives an administrator > a way to log into a cluster again after such a change, but then they > really need to update the names in catalogs, at least for any database > that might be used as a template, to get a 100% functional system. So > I have added a paragraph to the documentation piece to explain that. > > 2. Including the encoding suffix eg "en-US.1252" doesn't actually > work after all (discovered by CI). You have to use just "en-US". (I > guess ".UTF-8" must be a special case.) So I have updated the > documentation example. I don't have a general comment, at least yet, but a couple, mostly stylistic, on your TAP test. change if ($^O eq 'MSWin32') to if ($windows_os) Also your use of "is()" is unidiomatic. The expected value should come after the received value, not before. While your use won't change the success or failure of the result, the message on failure will be at best confusing. (I ran into this the other day, so it's fresh in my memory). See Test::More documentation. Using all upper case for variables is not our usual perl style (or anybody else's that I am aware of). Mostly we just use lower case with underscores. If you use the "constant" module upper case letters are usually used, but then the things defined are bare words, and come with some caveats, and I don't think you want to use it here. The substance of the test looks good. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Mon, Jul 22, 2024 at 1:28 AM Andrew Dunstan <andrew@dunslane.net> wrote: > change > > if ($^O eq 'MSWin32') > > to > > if ($windows_os) Done. > Also your use of "is()" is unidiomatic. The expected value should come > after the received value, not before. While your use won't change the > success or failure of the result, the message on failure will be at best > confusing. (I ran into this the other day, so it's fresh in my memory). > See Test::More documentation. Fixed. > Using all upper case for variables is not our usual perl style (or > anybody else's that I am aware of). Mostly we just use lower case with > underscores. If you use the "constant" module upper case letters are > usually used, but then the things defined are bare words, and come with > some caveats, and I don't think you want to use it here. Fixed. > The substance of the test looks good. Thanks for looking! One new-to-me fact from the -general mailing list this morning[1] is that the EDB installer is explicitly feeding these unstable locale names into the world's database clusters, so that's something that's beyond the power of us upstream PostgreSQL hackers to fix even if we pulled the trigger on the initdb BCP47-by-default patch[2]. Something for someone at EDB to look into? [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJA4pGEeZtfUUEuQmNNO7x16wZ%2BRWOeXP%2Btq3Hf4iu8oQ%40mail.gmail.com#6426ff9fd44fbcd9a7f343a3f94ae364 [2] https://www.postgresql.org/message-id/flat/CA+hUKGJ=XThErgAQRoqfCy1bKPxXVuF0=2zDbB+SxDs59pv7Fw@mail.gmail.com
Attachment
Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
One new-to-me fact from the -general mailing list this morning[1] is that the EDB installer is explicitly feeding these unstable locale names into the world's database clusters, so that's something that's beyond the power of us upstream PostgreSQL hackers to fix even if we pulled the trigger on the initdb BCP47-by-default patch[2]. Something for someone at EDB to look into? [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJA4pGEeZtfUUEuQmNNO7x16wZ%2BRWOeXP%2Btq3Hf4iu8oQ%40mail.gmail.com#6426ff9fd44fbcd9a7f343a3f94ae364 [2] https://www.postgresql.org/message-id/flat/CA+hUKGJ=XThErgAQRoqfCy1bKPxXVuF0=2zDbB+SxDs59pv7Fw@mail.gmail.com
Sandeep, can you comment?
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On 2024-07-21 Su 6:00 PM, Thomas Munro wrote:One new-to-me fact from the -general mailing list this morning[1] is that the EDB installer is explicitly feeding these unstable locale names into the world's database clusters, so that's something that's beyond the power of us upstream PostgreSQL hackers to fix even if we pulled the trigger on the initdb BCP47-by-default patch[2]. Something for someone at EDB to look into? [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJA4pGEeZtfUUEuQmNNO7x16wZ%2BRWOeXP%2Btq3Hf4iu8oQ%40mail.gmail.com#6426ff9fd44fbcd9a7f343a3f94ae364 [2] https://www.postgresql.org/message-id/flat/CA+hUKGJ=XThErgAQRoqfCy1bKPxXVuF0=2zDbB+SxDs59pv7Fw@mail.gmail.com
Sandeep, can you comment?
Executing C:\Windows\System32\cscript //NoLogo "C:\Program Files\PostgreSQL\16/installer/server/initcluster.vbs" "NT AUTHORITY\NetworkService" "postgres" "****" "C:\Users\User1\AppData\Local\Temp/postgresql_installer_cd79fad8b7" "C:\Program Files\PostgreSQL\16" "C:\DATA_PG16" 5432 "Turkish,Türkiye" 0
On Tue, Jul 23, 2024 at 12:12 AM Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> wrote: > Thomas, I can generate the test installer with your patch and share here so this issue can be tested before the minor release? That might be useful, thanks! If someone wants to try it out on a real Windows system. . o O { I wonder if it could ever be possible to have that automated via CI. Imagine if I/anyone could clone the EDB installer github repo, and push a small patch into a private branch that tells it where to pull PostgreSQL sources from, including any public git branch including commitfest ones, and some options like with/without assertions, and have CI spit out a new installer as an artifact? Hopefully non-REL_XXX_STABLE branches would have a big warning on the GUI so that users would never accidentally think it's a real release. }
Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On Tue, Jul 23, 2024 at 12:12 AM Sandeep Thakkar
<sandeep.thakkar@enterprisedb.com> wrote:
> Thomas, I can generate the test installer with your patch and share here so this issue can be tested before the minor release?
That might be useful, thanks! If someone wants to try it out on a
real Windows system.
Sure, I can give it a try. Is your patch ready?
. o O { I wonder if it could ever be possible to have that automated
via CI. Imagine if I/anyone could clone the EDB installer github
repo, and push a small patch into a private branch that tells it where
to pull PostgreSQL sources from, including any public git branch
including commitfest ones, and some options like with/without
assertions, and have CI spit out a new installer as an artifact?
Hopefully non-REL_XXX_STABLE branches would have a big warning on the
GUI so that users would never accidentally think it's a real release.
}
I am going to withdraw this proposal. In another email thread, we learned two very important things: 1. The problem of a locale names changing for an *existing* database can apparently be fixed using "Locale Builder"[1]. Users in Turkey have presumably already addressed the situation that way and there are reports of articles about that in Turkish. We don't really need the "setlocale.map" rename-on-the-fly patch in this thread, and I don't want to maintain it if we don't need it. 2. The Windows runtime library is crashing when initdb's subprocess postgres.exe calls it with a name encoded in an encoding it doesn't like[2]. (IMHO it should return NULL instead! The C standard does not have a demons-may-fly-out-of-your-nose clause.) We need to tell users not to use those locale names. That string is actually coming from the EDB installer, and we'll follow up on that. Perhaps PostgreSQL should also consider stripping non-ASCII characters to force it to fail nicely. [1] https://www.postgresql.org/message-id/CA%2BhUKGLyfou2xa2fxGOs0WwkOrcKjwqn-bRNR01Hq5RAD16ueg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CA%2BhUKG%2BFxeRLURZ%3Dn8NPyLwgjFds_SqU_cQvE40ks6RQKUGbGg%40mail.gmail.com
On Wed, Jul 31, 2024 at 9:53 PM Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> wrote: > On Thu, Jul 25, 2024 at 11:39 AM Thomas Munro <thomas.munro@gmail.com> wrote: >> On Tue, Jul 23, 2024 at 12:12 AM Sandeep Thakkar >> <sandeep.thakkar@enterprisedb.com> wrote: >> . o O { I wonder if it could ever be possible to have that automated >> via CI. Imagine if I/anyone could clone the EDB installer github >> repo, and push a small patch into a private branch that tells it where >> to pull PostgreSQL sources from, including any public git branch >> including commitfest ones, and some options like with/without >> assertions, and have CI spit out a new installer as an artifact? >> Hopefully non-REL_XXX_STABLE branches would have a big warning on the >> GUI so that users would never accidentally think it's a real release. >> } > > yeah, that'll be helpful but unfortunately it's not possible atm. Hopefully, we will > have that available in future through the GitHub actions. Sounds great. FWIW it's easy[1][2] to make the PostgreSQL CI spit out a zip file with a snapshot build of PostgreSQL. I expect it doesn't actually run on a real Windows system unless you also install DLLs it needs like OpenSSL, ICU etc, and I'm not sure if that would be best done by stuffing them too into the .zip file, or if the scripts[3] that the CI images are prepared with should be used by someone who wants to test it. But thinking about that leads me back to the thought that it'd be better if people interested in trying pre-releases could get an EDB installer for a snapshot build from any branch anywhere, so it all works the way they will actually use it in real life. And of course you guys already did all the hard work to deal with dependencies. [1] https://github.com/macdice/postgres/commit/56e967f9cd35ef418a653d191a4d9ea85055d94b [2] https://cirrus-ci.com/task/4831619775725568 [3] https://github.com/anarazel/pg-vm-images/tree/main/scripts
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Thomas and Andrew, Just would like to follow up with you about this bug. How is it going with the investigation? Thanks! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Wednesday, August 7, 2024 7:35 PM To: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> Cc: Andrew Dunstan <andrew@dunslane.net>; Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Rahul Pandey<pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>;Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com> Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 I am going to withdraw this proposal. In another email thread, we learned two very important things: 1. The problem of a locale names changing for an *existing* database can apparently be fixed using "Locale Builder"[1]. Users in Turkey have presumably already addressed the situation that way and there are reports of articles aboutthat in Turkish. We don't really need the "setlocale.map" rename-on-the-fly patch in this thread, and I don't wantto maintain it if we don't need it. 2. The Windows runtime library is crashing when initdb's subprocess postgres.exe calls it with a name encoded in an encodingit doesn't like[2]. (IMHO it should return NULL instead! The C standard does not have a demons-may-fly-out-of-your-noseclause.) We need to tell users not to use those locale names. That string is actually comingfrom the EDB installer, and we'll follow up on that. Perhaps PostgreSQL should also consider stripping non-ASCII charactersto force it to fail nicely. [1] https://www.postgresql.org/message-id/CA%2BhUKGLyfou2xa2fxGOs0WwkOrcKjwqn-bRNR01Hq5RAD16ueg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CA%2BhUKG%2BFxeRLURZ%3Dn8NPyLwgjFds_SqU_cQvE40ks6RQKUGbGg%40mail.gmail.com
+Peter Constable, FYI -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Wednesday, August 28, 2024 3:16 AM To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> Cc: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>; Andrew Dunstan <andrew@dunslane.net>; Rahul Pandey <pandeyrah@microsoft.com>;Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky<amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>;Ashish Nawal <nawalashish@microsoft.com> Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Tue, Aug 27, 2024 at 9:29 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Just would like to follow up with you about this bug. How is it going with the investigation? Well, we figured out that Windows is aborting inside setlocale(). The C standard says "The pointer to string returned bythe setlocale function is such that a subsequent call with that string value and its associated category will restore thatpart of the program’s locale." Window does not conform, and crashes itself. One practical change we could make would be to reject non-ASCII locale names in initdb, as in the attached, and back-patchto 16. Then the EDB installer (a separate project that is calling initdb and providing the non-ASCII string containing"ü") would at least get a graceful failure instead of a crash. It hardly makes a difference to the user, but I'vefinally understood that these crashes in the Windows runtime library might be generating a stream of crash reports toMicrosoft, which might be bringing you here? For example: $ initdb -D asdf --locale="liberté" The files belonging to this database system will be owned by user "tmunro". This user must also own the server process. initdb: error: locale name "liberté" contains non-ASCII characters
Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On 2024-08-28 We 6:16 AM, Thomas Munro wrote: > On Tue, Aug 27, 2024 at 9:29 AM Haifang Wang (Centific Technologies > Inc) <v-haiwang@microsoft.com> wrote: >> Just would like to follow up with you about this bug. How is it going with the investigation? > Well, we figured out that Windows is aborting inside setlocale(). The > C standard says "The pointer to string returned by the setlocale > function is such that a subsequent call with that string value and its > associated category will restore that part of the program’s locale." > Window does not conform, and crashes itself. > > One practical change we could make would be to reject non-ASCII locale > names in initdb, as in the attached, and back-patch to 16. Then the > EDB installer (a separate project that is calling initdb and providing > the non-ASCII string containing "ü") would at least get a graceful > failure instead of a crash. It hardly makes a difference to the user, > but I've finally understood that these crashes in the Windows runtime > library might be generating a stream of crash reports to Microsoft, > which might be bringing you here? > > For example: > > $ initdb -D asdf --locale="liberté" > The files belonging to this database system will be owned by user "tmunro". > This user must also own the server process. > > initdb: error: locale name "liberté" contains non-ASCII characters +1 for doing this. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Thanks for the reply, Andrew and Thomas. The issue is originally from user feedback. And our team verified and reproducedit too. And it is high priority from our end is because: 1. It is regression, it did not see it in the previousbuild. 2. After the crash happened, even the app could be installed, but cluster initialization failed. And due towhich we cannot connect to the database and proceed further. @Shawn @Rahul, do you have any comment or suggestions? Regards! Haifang -----Original Message----- From: Andrew Dunstan <andrew@dunslane.net> Sent: Thursday, August 29, 2024 8:17 AM To: Thomas Munro <thomas.munro@gmail.com>; Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> Cc: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>; Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>;Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>;pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com> Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 [You don't often get email from andrew@dunslane.net. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification] On 2024-08-28 We 6:16 AM, Thomas Munro wrote: > On Tue, Aug 27, 2024 at 9:29 AM Haifang Wang (Centific Technologies > Inc) <v-haiwang@microsoft.com> wrote: >> Just would like to follow up with you about this bug. How is it going with the investigation? > Well, we figured out that Windows is aborting inside setlocale(). The > C standard says "The pointer to string returned by the setlocale > function is such that a subsequent call with that string value and its > associated category will restore that part of the program’s locale." > Window does not conform, and crashes itself. > > One practical change we could make would be to reject non-ASCII locale > names in initdb, as in the attached, and back-patch to 16. Then the > EDB installer (a separate project that is calling initdb and providing > the non-ASCII string containing "ü") would at least get a graceful > failure instead of a crash. It hardly makes a difference to the user, > but I've finally understood that these crashes in the Windows runtime > library might be generating a stream of crash reports to Microsoft, > which might be bringing you here? > > For example: > > $ initdb -D asdf --locale="liberté" > The files belonging to this database system will be owned by user "tmunro". > This user must also own the server process. > > initdb: error: locale name "liberté" contains non-ASCII characters +1 for doing this. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com/
Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On 2024-08-29 Th 1:53 PM, Haifang Wang (Centific Technologies Inc) wrote: > Thanks for the reply, Andrew and Thomas. The issue is originally from user feedback. And our team verified and reproducedit too. And it is high priority from our end is because: 1. It is regression, it did not see it in the previousbuild. 2. After the crash happened, even the app could be installed, but cluster initialization failed. And due towhich we cannot connect to the database and proceed further. Please don't top-post on PostgreSQL lists. If I read Thomas correctly, the solution is to create a locale with an acceptable (pure Ascii) name using Locale Builder, and then call initdb using that name. It might be that you need to do some of this manually if the installer fails. IIRC is it possible to run the installer without having it create a cluster automatically. You can then set up the cluster and create a windows service manually using initdb / pg_ctl. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Fri, Aug 30, 2024 at 3:16 AM Andrew Dunstan <andrew@dunslane.net> wrote: > On 2024-08-28 We 6:16 AM, Thomas Munro wrote: > > initdb: error: locale name "liberté" contains non-ASCII characters > > +1 for doing this. Thanks for looking! OK, let's go with this approach. That two-line demo patch doesn't work properly for the system default, and of course it should be in sync with the backend function, and needs comments to explain this mess, so I'll work on a serious patch.
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Thomas, How is it going after all the analysis below? Thanks! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Thursday, August 29, 2024 8:22 PM To: Andrew Dunstan <andrew@dunslane.net> Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>;Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; ShawnSteele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com> Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Fri, Aug 30, 2024 at 10:55 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Aug 30, 2024 at 3:16 AM Andrew Dunstan <andrew@dunslane.net> wrote: > > On 2024-08-28 We 6:16 AM, Thomas Munro wrote: > > > initdb: error: locale name "liberté" contains non-ASCII characters > > > > +1 for doing this. > > Thanks for looking! OK, let's go with this approach. That two-line > demo patch doesn't work properly for the system default, and of course > it should be in sync with the backend function, and needs comments to > explain this mess, so I'll work on a serious patch. Done. To develop this patch, I set up a cursed locale name on my FreeBSD box like so: $ sudo cp -r /usr/share/locale/fr_FR.UTF-8 /usr/share/locale/fraternité.UTF-8 Then I could exercise the setlocale("") path, approximating what I expect a Turkish Windows system to do without explicit--locale: $ LC_TIME="fraternité.UTF-8" initdb -D pdata ... initdb: error: locale name "fraternité.UTF-8" contains non-ASCII characters The explicit path is easier to test, like before: $ initdb -D pgdata --locale "égalité.UTF-9" ... initdb: error: locale name "égalité.UTF-9" contains non-ASCII characters That's what the EDB installer (if not changed to provide ASCII-clean names somehow or other) will start failing with, ifit tries to feed umlauts to initdb. Working through that, I understood another problem: even if you provide an acceptable string like --locale "tr-TR", we stillneed to be able to save-and-restore the value that initdb inherited from the environment. Using my simulation, whatI mean is if you do: $ export LANG="liberté.UTF-8" ... then ... $ initdb -D pgdata --locale "fr_FR.UTF-8" ... it still needs to be able to do a safe save-and-restore back to the environment's value without crashing, even thoughit's not really using that value. I wondered if initdb really needs a locale other than "C" for itself, but I'm tryingto minimise the changes here so I don't break something else... That seems to mean that we have to use _wsetlocale() for that particular save/restore. This is essentially what Haifang's colleagues were suggesting upthread. Done. Once a server is up and running, we shouldn't be exposed to the server environment (right?), but there are several more placesthat accept user-supplied strings, and only CREATE COLLATION fails to run check_locale(): postgres=# create collation foo (locale = 'fraternité.UTF-8'); CREATE COLLATION The other places I am aware of do, so putting the equivalent code into check_locale() does the right thing: postgres=# create database db locale = 'fraternité.UTF-8'; WARNING: locale name "fraternité.UTF-8" contains non-ASCII characters ERROR: invalid LC_COLLATE locale name: "fraternité.UTF-8" HINT: If the locale name is specific to ICU, use ICU_LOCALE. postgres=# set lc_time = 'fraternité.UTF-8'; WARNING: locale name "fraternité.UTF-8" contains non-ASCII characters ERROR: invalid value for parameter "lc_time": "fraternité.UTF-8" postgres=# set lc_messages = 'fraternité.UTF-8'; WARNING: locale name "fraternité.UTF-8" contains non-ASCII characters ERROR: invalid value for parameter "lc_messages": "fraternité.UTF-8" postgres=# set lc_monetary = 'fraternité.UTF-8'; WARNING: locale name "fraternité.UTF-8" contains non-ASCII characters ERROR: invalid value for parameter "lc_monetary": "fraternité.UTF-8" Those last SET lc_* changes aren't strictly necessary, because we never restore to those, we only switch to them and thenrestore to the database default, but since they share code with CREATE DATABASE the new check kicks in anyway. I tried adding check_locale() checks into CREATE COLLATION, but that made some tests fail (huh, I didn't know that the setof strings acceptable to _newlocale() and setlocale() are different -- the former seems to accept "POSIX" on that OS,which is exercised by some tests, or maybe it's some path of ours, I didn't check). But, on reflection, I don't believethat COLLATION objects actually have this problem anyway, because we never call setlocale() to them, we just use themodern thread-safe POSIX-2008-oid APIs. So I took that back out to minimise the patch. Please see attached, including a restatement of the problem and solution in the commit message. I suppose we could add some tests for this, but I was a bit unsure about the rules for putting non-ASCII into perl scriptsand command lines, and would need to research that... (Note: a lot of this old stuff is getting ripped out in master by the ongoing multi-threading effort...)
On Thu, Sep 5, 2024 at 9:28 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > How is it going after all the analysis below? I proposed a patch. I am waiting for someone to try it out, review it and write the magic words "looks good to me", before I commit it and back-patch it. PostgreSQL will next make a release in early November. Shortly after that, the EDB Windows installer team would begin publishing a new Windows installer package, and then you should stop hearing from Windows users whose runtime library exploded. Hopefully they will also adjust their GUI that offers a list of locale names to users, so that it doesn't even offer the ones that don't work, and does offer some better ones, like (probably) "tr-TR.UTF-8".
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Cool. is anyone going to try it out? Looking forward to the back-patch and release. Regards! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Wednesday, September 4, 2024 2:41 PM To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> Cc: Andrew Dunstan <andrew@dunslane.net>; Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>; Rahul Pandey <pandeyrah@microsoft.com>;Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky<amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>;Ashish Nawal <nawalashish@microsoft.com> Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Thu, Sep 5, 2024 at 9:28 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > How is it going after all the analysis below? I proposed a patch. I am waiting for someone to try it out, review it and write the magic words "looks good to me", beforeI commit it and back-patch it. PostgreSQL will next make a release in early November. Shortly after that, the EDB Windows installer team would begin publishing a new Windows installer package, and then you shouldstop hearing from Windows users whose runtime library exploded. Hopefully they will also adjust their GUI that offersa list of locale names to users, so that it doesn't even offer the ones that don't work, and does offer some betterones, like (probably) "tr-TR.UTF-8".
"Haifang Wang (Centific Technologies Inc)" <v-haiwang@microsoft.com> writes: > Cool. is anyone going to try it out? Looking forward to the back-patch and release. I thought the implication was pretty clear that YOU ought to try it out. regards, tom lane
On Thu, Sep 5, 2024 at 10:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Haifang Wang (Centific Technologies Inc)" <v-haiwang@microsoft.com> writes: > > Cool. is anyone going to try it out? Looking forward to the back-patch and release. > > I thought the implication was pretty clear that YOU ought to try it out. It would also be good to hear from Sandeep and crew, if possible, since they will be dealing with the consequences of this change. But here is one thing Haifang might like to consider: the documentation[1] for setlocale() could really use a note in the remarks section to highlight this danger. Nothing in there seems to alert the programmer to the fact that save = setlocale(NULL), setlocale("something else"), setlocale(save) might fail to restore the previous locale, and might even abort. Perhaps that's a weird, old, non-thread-safe thing to be doing (and we're working on removing that), but it is quite explicitly described in the standard. C99 7.11.1.1 paragraph 8: "The pointer to string returned by the setlocale function is such that a subsequent call with that string value and its associated category will restore that part of the program’s locale". Aborting also fails paragraph 6: "If the selection cannot be honored, the setlocale function returns a null pointer and the program’s locale is not changed" (perhaps this has to do with the crt report mode stuff, I don't know as I'm not a Windows programmer, I'm just trying to maintain a C program; or maybe it's actually a bug?). Another thing is that setlocale(NULL) continues to return the "old style" names while everything else in the OS is using the modern BCP 47 system since Vista, and the locale documentation recommends that applications prefer those[2]. It might be worth a note about that too? (I guess stability and backwards compatibility are behind that choice, but ... it's not really stable anyway.) [1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170 [2] https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-170
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Okay, I will ask our tester to try the patch. But have you tried it from your end? @Thomas Munro @Tom Lane -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Wednesday, September 4, 2024 3:35 PM To: Tom Lane <tgl@sss.pgh.pa.us> Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Andrew Dunstan <andrew@dunslane.net>; Sandeep Thakkar<sandeep.thakkar@enterprisedb.com>; Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>;Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com> Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Thu, Sep 5, 2024 at 10:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Haifang Wang (Centific Technologies Inc)" <v-haiwang@microsoft.com> writes: > > Cool. is anyone going to try it out? Looking forward to the back-patch and release. > > I thought the implication was pretty clear that YOU ought to try it out. It would also be good to hear from Sandeep and crew, if possible, since they will be dealing with the consequences of thischange. But here is one thing Haifang might like to consider: the documentation[1] for setlocale() could really use a note in theremarks section to highlight this danger. Nothing in there seems to alert the programmer to the fact that save = setlocale(NULL),setlocale("something else"), setlocale(save) might fail to restore the previous locale, and might even abort. Perhaps that's a weird, old, non-thread-safe thing to be doing (and we're working on removing that), but it is quiteexplicitly described in the standard. C99 7.11.1.1 paragraph 8: "The pointer to string returned by the setlocale function is such that a subsequent call with thatstring value and its associated category will restore that part of the program’s locale". Aborting also fails paragraph 6: "If the selection cannot be honored, the setlocale function returns a null pointer and theprogram’s locale is not changed" (perhaps this has to do with the crt report mode stuff, I don't know as I'm not a Windowsprogrammer, I'm just trying to maintain a C program; or maybe it's actually a bug?). Another thing is that setlocale(NULL)continues to return the "old style" names while everything else in the OS is using the modern BCP 47 system since Vista, and the locale documentation recommendsthat applications prefer those[2]. It might be worth a note about that too? (I guess stability and backwardscompatibility are behind that choice, but ... it's not really stable anyway.) [1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170 [2] https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-170
Okay, I will ask our tester to try the patch. But have you tried it from your end? @Thomas Munro @Tom Lane
On Thu, Sep 5, 2024 at 11:24 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Okay, I will ask our tester to try the patch. But have you tried it from your end? @Thomas Munro @Tom Lane I demonstrated it working on my computer in the email: https://www.postgresql.org/message-id/CA%2BhUKG%2B28zRyk8U%3DeFLn8O%2B17VrbpKykWPVrSzf57Ru32dQo8A%40mail.gmail.com But I can't test on Windows. For context, I have a "UNIX -- Live Free or Die" license plate attached to the side of my workstation.
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Good to know that, Thomas. I'm asking our tester to try it. Will let you know. Regards! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Wednesday, September 4, 2024 4:39 PM To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> Cc: Tom Lane <tgl@sss.pgh.pa.us>; Andrew Dunstan <andrew@dunslane.net>; Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>;Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; ShawnSteele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; pgsql-bugs@lists.postgresql.org; Shweta Gulati<gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com> Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Thu, Sep 5, 2024 at 11:24 AM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Okay, I will ask our tester to try the patch. But have you tried it > from your end? @Thomas Munro @Tom Lane I demonstrated it working on my computer in the email: https://www.postgresql.org/message-id/CA%2BhUKG%2B28zRyk8U%3DeFLn8O%2B17VrbpKykWPVrSzf57Ru32dQo8A%40mail.gmail.com But I can't test on Windows. For context, I have a "UNIX -- Live Free or Die" license plate attached to the side of my workstation.
On Thu, Sep 5, 2024 at 12:07 PM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Good to know that, Thomas. I'm asking our tester to try it. Will let you know. To give a more serious response, I don't really expect anything "simple" about this patch not to work on Windows, as I know that it it rejects non-ASCII names as I showed (on all OSes), and I know that compiles and passes our tests on all our 4 CI operating systems: https://cirrus-ci.com/build/4941642090676224 Really what I'm looking for is (1) feedback on the approach, code and comments, and thoughts about more complex scenarios I may have failed to think about, including say, pg_dump, pg_upgrade etc operational issues, which probably involves lots of previous experience with PostgreSQL, (2) opinions on whether we should add a test for these cases and how to put the UTF-8 into a script (I'm confused about the encoding of command line arguments), and (3) a nod from the EDB people involved in distributing this software on Windows.
Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On Thu, Sep 5, 2024 at 12:07 PM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Good to know that, Thomas. I'm asking our tester to try it. Will let you know.
To give a more serious response, I don't really expect anything
"simple" about this patch not to work on Windows, as I know that it it
rejects non-ASCII names as I showed (on all OSes), and I know that
compiles and passes our tests on all our 4 CI operating systems:
https://cirrus-ci.com/build/4941642090676224
Really what I'm looking for is (1) feedback on the approach, code and
comments, and thoughts about more complex scenarios I may have failed
to think about, including say, pg_dump, pg_upgrade etc operational
issues, which probably involves lots of previous experience with
PostgreSQL, (2) opinions on whether we should add a test for these
cases and how to put the UTF-8 into a script (I'm confused about the
encoding of command line arguments), and (3) a nod from the EDB people
involved in distributing this software on Windows.
We can help with producing the builds with the patches provided. You had also
mentioned about the changes required in the installer script, will it still be required?
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
@Sandeep Thakkar, I believe your question is to Thomas, is that right?
@Thomas Munro regarding the patch you shared previously, could you give some instruction on how to use it?
Thanks!
Haifang
From: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>
Sent: Thursday, September 5, 2024 4:47 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; Andrew Dunstan <andrew@dunslane.net>; Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
On Thu, Sep 5, 2024 at 5:46 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Thu, Sep 5, 2024 at 12:07 PM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Good to know that, Thomas. I'm asking our tester to try it. Will let you know.
To give a more serious response, I don't really expect anything
"simple" about this patch not to work on Windows, as I know that it it
rejects non-ASCII names as I showed (on all OSes), and I know that
compiles and passes our tests on all our 4 CI operating systems:
https://cirrus-ci.com/build/4941642090676224
Really what I'm looking for is (1) feedback on the approach, code and
comments, and thoughts about more complex scenarios I may have failed
to think about, including say, pg_dump, pg_upgrade etc operational
issues, which probably involves lots of previous experience with
PostgreSQL, (2) opinions on whether we should add a test for these
cases and how to put the UTF-8 into a script (I'm confused about the
encoding of command line arguments), and (3) a nod from the EDB people
involved in distributing this software on Windows.
We can help with producing the builds with the patches provided. You had also
mentioned about the changes required in the installer script, will it still be required?
--
Sandeep Thakkar
On Thu, Sep 5, 2024 at 11:46 PM Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> wrote: > On Thu, Sep 5, 2024 at 5:46 AM Thomas Munro <thomas.munro@gmail.com> wrote: >> Really what I'm looking for is (1) feedback on the approach, code and >> comments, and thoughts about more complex scenarios I may have failed >> to think about, including say, pg_dump, pg_upgrade etc operational >> issues, which probably involves lots of previous experience with >> PostgreSQL, (2) opinions on whether we should add a test for these >> cases and how to put the UTF-8 into a script (I'm confused about the >> encoding of command line arguments), and (3) a nod from the EDB people >> involved in distributing this software on Windows. If I don't hear any objections to this plan soon, I'm going to commit this and back-patch it into PostgreSQL 16 and PostgreSQL 17 after the upcoming code freeze for the PostgreSQL 17 release ends. So it'll probably be in 16.5 and 17.1. > We can help with producing the builds with the patches provided. You had also > mentioned about the changes required in the installer script, will it still be required? If you don't change the installer script, then it will still fail if someone selects "Türkiye" in your GUI, but now it will fail with an ERROR rejecting non-ASCII characters, instead of crashing. So people in Türkiye, Côte d'Ivoire, Curaçao etc will still have no way to initialise a cluster with your GUI in PostgreSQL 16.5 and 17.1 unless they follow the instructions on the web to create a "Turkey" (or whatever non-ASCII string they want). Of course they could always use initdb.exe directly from the command line with a BCP47 name. Maybe that's OK, but I think you should consider changing the installer. A conservative way to do it would be to show all the existing options that you have now (so that someone who is happy using the old style names when they don't contain non-ASCII can keep doing so), but also have a second entry for each country that shows "Turkish, Türkiye (tr-TR)" and/or perhaps "Turkish, Türkiye (tr-TR.UTF-8)" or perhaps both, and passes just that part in parentheses to initdb, to give users all the options. Or perhaps you could have a checkbox "BCP 47 locales" that changes the list to show them. No one has really reported any real world experience choosing between the tr-TR vs tr-TR.UTF-8 alternatives, and you might like to experiment with that. The second option makes Windows' system libraries use UTF-8 encoding instead of the traditional encoding associated with the language. As far as I can tell, it doesn't make any difference at all to PostgreSQL yet, because your installer always uses --encoding="UTF8" and, on Windows only, that makes PostgreSQL ignore the locale's encoding and do a whole lot of internal conversation to wchar_t because PostgreSQL doesn't yet know that Windows 10+ can work with UTF-8 directly. The reason that I am interested in this .UTF-8-or-not question is that I'd like to consider *disallowing* non-matching encodings (see commitfest entry #3772, reviewers wanted!), and teaching PostgreSQL that Windows does in fact have UTF-8, just so we can delete a lot of slow special case code, harmonise with Unix, and generally catch up with reality. So I figure we might as well start encouraging the "xx-XX.UTF-8" names when using --encoding="UTF8" if we can't find any downside, because under that plan it would eventually become illegal to use --locale="tr-TR" (no .UTF-8) with --encoding="UTF-8" if that eventually goes in, so it seems sensible to stop creating new clusters that way ASAP so that users have a better time upgrading in the future. For example, a pg_upgrade from a PostgreSQL 17 cluster initialised with --locale="tr-TR" --encoding="UTF8" to PostgreSQL 18 would proabbly require some extra step to rename "tr-TR" to "tr-TR.UTF8" at some point (not sure exactly where), if PostgreSQL 18 starts rejecting the non-matching combination. I don't know where that'll go, though -- it's not high priority work, it's just incremental cleanup and modernisation that practically suggests itself whenever looking at rejigging locale code for thread-safety and reading all those comments about wchar_t that are not true.
On Fri, Sep 20, 2024 at 11:37 AM Thomas Munro <thomas.munro@gmail.com> wrote: > If I don't hear any objections to this plan soon, I'm going to commit > this and back-patch it into PostgreSQL 16 and PostgreSQL 17 after the > upcoming code freeze for the PostgreSQL 17 release ends. So it'll > probably be in 16.5 and 17.1. Done, after another round of polishing and testing. The main change since v2 was: I realised that initdb's locale_date_order() also contains a save-and-restore, which should share code at least for consistency (I don't personally know which categories really affect process-wide encoding assumptions, so this might not technically be necessary). I moved the relevant code into new helper functions, which I think came out tidier. The wchar_t-based save-restore in initdb.exe is based on a theoretical observation, as the report was about postgres.exe crashing and that should now be unreachable with these new name checks, but we can see the general phenomenon from a two-line standalone repro[1] and so I suspect there must be environment/input combinations that could crash initdb.exe itself without it. We don't have any concrete reports of that, but I preferred to close the theoretical risk using the recommended technique. Sorry for taking so long to untangle this mess; I assumed for much of the thread that we were talking about the failure of existing clusters to come up (as complained about in numerous other threads), not this new kind of crash, and was trying to fix that before the Locale Builder solution was widely known. Fixing Windows bugs as a non-Windows guy by reading the manual and testing patches on CI can sometimes feel a little like trying to patch a Mars rover... Thanks to David R for help with a crucial observation using a martian debugger. CREATE COLLATION should probably also start using check_locale() eventually, and I may look into harmonising that and figure out how to test it in nearby future work on locales vs threads. [1] https://www.postgresql.org/message-id/CA+hUKG+FxeRLURZ=n8NPyLwgjFds_SqU_cQvE40ks6RQKUGbGg@mail.gmail.com
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Hi Thomas and team, Just would like to follow up with you about this bug. Is the bug fix available? We noticed that there is a new build 17.0-1.However the issue repro on that build too. Thanks! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Friday, October 4, 2024 6:17 PM To: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; Andrew Dunstan <andrew@dunslane.net>;Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>;Amy Wishnousky <amyw@microsoft.com>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>;Ashish Nawal <nawalashish@microsoft.com> Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Fri, Sep 20, 2024 at 11:37 AM Thomas Munro <thomas.munro@gmail.com> wrote: > If I don't hear any objections to this plan soon, I'm going to commit > this and back-patch it into PostgreSQL 16 and PostgreSQL 17 after the > upcoming code freeze for the PostgreSQL 17 release ends. So it'll > probably be in 16.5 and 17.1. Done, after another round of polishing and testing. The main change since v2 was: I realised that initdb's locale_date_order() also contains a save-and-restore, which should share code at least for consistency (I don't personallyknow which categories really affect process-wide encoding assumptions, so this might not technically be necessary). I moved the relevant code into new helper functions, which I think came out tidier. The wchar_t-based save-restore in initdb.exe is based on a theoretical observation, as the report was about postgres.execrashing and that should now be unreachable with these new name checks, but we can see the general phenomenonfrom a two-line standalone repro[1] and so I suspect there must be environment/input combinations that could crashinitdb.exe itself without it. We don't have any concrete reports of that, but I preferred to close the theoreticalrisk using the recommended technique. Sorry for taking so long to untangle this mess; I assumed for much of the thread that we were talking about the failure ofexisting clusters to come up (as complained about in numerous other threads), not this new kind of crash, and was tryingto fix that before the Locale Builder solution was widely known. Fixing Windows bugs as a non-Windows guy by readingthe manual and testing patches on CI can sometimes feel a little like trying to patch a Mars rover... Thanks to DavidR for help with a crucial observation using a martian debugger. CREATE COLLATION should probably also start using check_locale() eventually, and I may look into harmonising that and figureout how to test it in nearby future work on locales vs threads. [1] https://www.postgresql.org/message-id/CA+hUKG+FxeRLURZ=n8NPyLwgjFds_SqU_cQvE40ks6RQKUGbGg@mail.gmail.com
On Tue, Nov 12, 2024 at 12:12 PM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Just would like to follow up with you about this bug. Is the bug fix available? We noticed that there is a new build 17.0-1.However the issue repro on that build too. Hi Haifang, The change is in v17.1 and v16.5, but the release process for those is in progress right now, getting them out to the packaging teams. Hopefully new Windows installer packages will appear in a few days or so?
RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Great to hear that, Thomas. Looking forward to hearing from your once the new release is available. Thanks! Haifang -----Original Message----- From: Thomas Munro <thomas.munro@gmail.com> Sent: Monday, November 11, 2024 4:43 PM To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> Cc: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>; Tom Lane <tgl@sss.pgh.pa.us>; Andrew Dunstan <andrew@dunslane.net>;Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>;Amy Wishnousky <amyw@microsoft.com>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>;Ashish Nawal <nawalashish@microsoft.com> Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607 On Tue, Nov 12, 2024 at 12:12 PM Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote: > Just would like to follow up with you about this bug. Is the bug fix available? We noticed that there is a new build 17.0-1.However the issue repro on that build too. Hi Haifang, The change is in v17.1 and v16.5, but the release process for those is in progress right now, getting them out to the packagingteams. Hopefully new Windows installer packages will appear in a few days or so?