Thread: Windows Application Issues | PostgreSQL | REF # 48475607

Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:

I’m sorry to bother you if you are not the right person. In that case, could you please help to point me in the right direction for the right contact? Thanks a lot!

Hi Team,

I’m a program manager in the Windows App Assure ISV Outreach Team at Microsoft. We work with Microsoft’s test organization to notify developers when issues have been identified in their applications. We’re reaching out to notify you of a potential issue in one of your applications.

The issue details are below, please review it when you have a moment. Our goal is to work with you to address this issue and to understand what your expected timeline to address this issue might be. If you have any questions about the details below or have already addressed this issue in a forthcoming update, please let me know.

Account

PostgreSQL Global Development Group

Product

PostgreSQL

Reference #

48475607

Issue

Postgres.exe crash observed while installing the application.

 

Environment: Desktop
OS: Windows 11
App Version: 16.1

Repro Steps:

  1. Deploy windows server 2019 Turkey Build.
  2. Patch the machine till WU and enable roles & feature in server manager.
  3. Restart the machine to configure updates & observe winver 17763.5329
  4. Login as Administrator.
  5. Download and install the application: https://www.postgresql.org/download/
  6. Click on Setup and in setup launched click on next.
  7. Click next on installation directory window.
  8. In Select components window make sure all are selected .and click on next .
  9. In data directory window click on next.
  10. In Password window provide the password and click on next.
  11. In port window click on next .
  12. Click on next default in advanced settings ,summary & ready to install window.
  13. While the application is installing ,
  14. Observe..

 

Observations:

Postgres.exe crash is observed while installing the application.

Expected Results:

Should not observe any crash while installing the application

Resource

For any questions on app development (or) submission on windows, contact Windows Dev Center. https://developer.microsoft.com/en-us/windows/support

 

Thanks!

Haifang

 

 

 

Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
Thomas Munro
Date:
On Fri, May 3, 2024 at 7:05 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Deploy windows server 2019 Turkey Build.
...
> Postgres.exe crash is observed while installing the application.

Hi,

"Crash" with no details is not a very useful report, but I guess that
this is the Turkey -> Türkiye issue caused by a recent operating
system upgrade, which is apparently now hitting the server edition of
Windows based on the new inflow of bug reports.  I think we know
approximately how to fix it, and there are several possible
workarounds for a system that is already in this state, but interested
parties who know and care about the relevant OS need to get involved
to make progress.  It's too late for Turkish but ideally we'll be able
to stop this from happening the next time a country changes its name.
I have written everything I know about the issue here:

https://www.postgresql.org/message-id/flat/18196-b10f93dfbde3d7db%40postgresql.org

Not being a Windows user, I have not been able to take that proposed
fix all the way to the finish line, so all I can do is post what I
know and hope that open source will happen.



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Hi Thomas,

Thanks a lot for your previous reply. Still need to double check with you:

You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by the
changesmade on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future?
 

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Thursday, May 2, 2024 4:51 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

[You don't often get email from thomas.munro@gmail.com. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification]
 

On Fri, May 3, 2024 at 7:05 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Deploy windows server 2019 Turkey Build.
...
> Postgres.exe crash is observed while installing the application.

Hi,

"Crash" with no details is not a very useful report, but I guess that this is the Turkey -> Türkiye issue caused by a
recentoperating system upgrade, which is apparently now hitting the server edition of Windows based on the new inflow
ofbug reports.  I think we know approximately how to fix it, and there are several possible workarounds for a system
thatis already in this state, but interested parties who know and care about the relevant OS need to get involved to
makeprogress.  It's too late for Turkish but ideally we'll be able to stop this from happening the next time a country
changesits name.
 
I have written everything I know about the issue here:

https://www.postgresql.org/message-id/flat/18196-b10f93dfbde3d7db%40postgresql.org

Not being a Windows user, I have not been able to take that proposed fix all the way to the finish line, so all I can
dois post what I know and hope that open source will happen.
 

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Hi Thomas and team,

Some suggestion from our engineers, please take a look and let me know if you have any question:

1. Could you please use _wsetlocale API in case of wide string locale parameter, this is what the recommended way as
parMicrosoft documentation as well. (Reference: setlocale:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170,wsetlocale:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170)
2. Additionally you can work around in case you don’t want to change the API to update their mapping table with
Turkish_Turin case of Turkish_Türkiye as you have done for others locale with same issue as well in your code
 
(reference PostgreSQL Source Code: src/port/win32setlocale.c Source File:
https://doxygen.postgresql.org/win32setlocale_8c_source.html,Line number 66-67)
 

Please let me know if there is any other questions.

Regards!
Haifang

-----Original Message-----
From: Haifang Wang (Centific Technologies Inc)
Sent: Monday, May 13, 2024 11:52 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

Hi Thomas,

Thanks a lot for your previous reply. Still need to double check with you:

You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by the
changesmade on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future?
 

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <mailto:thomas.munro@gmail.com>
Sent: Thursday, May 2, 2024 4:51 PM
To: Haifang Wang (Centific Technologies Inc) <mailto:v-haiwang@microsoft.com>
Cc: mailto:pgsql-bugs@lists.postgresql.org
Subject: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

[You don't often get email from mailto:thomas.munro@gmail.com. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification]
 

On Fri, May 3, 2024 at 7:05 AM Haifang Wang (Centific Technologies
Inc) <mailto:v-haiwang@microsoft.com> wrote:
> Deploy windows server 2019 Turkey Build.
...
> Postgres.exe crash is observed while installing the application.

Hi,

"Crash" with no details is not a very useful report, but I guess that this is the Turkey -> Türkiye issue caused by a
recentoperating system upgrade, which is apparently now hitting the server edition of Windows based on the new inflow
ofbug reports.  I think we know approximately how to fix it, and there are several possible workarounds for a system
thatis already in this state, but interested parties who know and care about the relevant OS need to get involved to
makeprogress.  It's too late for Turkish but ideally we'll be able to stop this from happening the next time a country
changesits name.
 
I have written everything I know about the issue here:

https://www.postgresql.org/message-id/flat/18196-b10f93dfbde3d7db%40postgresql.org

Not being a Windows user, I have not been able to take that proposed fix all the way to the finish line, so all I can
dois post what I know and hope that open source will happen.
 

On Tue, May 14, 2024 at 6:51 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by
thechanges made on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future? 

First, let me restate the problem:

When you create a database cluster (= a PostgreSQL instance) with
"initdb", unless you request a default locale with --locale, initdb
uses setlocale("") to query the system/user default locale.  It then
records that string in postgresql.conf, and also in the pg_database
catalog.  On POSIX systems, that captures something like "tr-TR.UTF-8"
or similar.  On Windows, that captures something like
"Turkish_Turkey.1254".  Later, PostgreSQL uses newlocale() or
setlocale() functions to access that locale again.  In rare cases
where a country changes its name, a Windows update *renames* the
locale, and then those calls fail, because the old name is not
recognised anymore.

I proposed a partial solution that should help avoid the problem in future:

I think that initdb should instead call GetUserDefaultLocaleName() to
discover the user account's default locale, because it returns BCP47
names like "tr-TR".  It is probably less likely for a country or
language to change its ISO code (but not impossible[1]), than for the
English-language names of them to change.  Even if you reject this
idea because technically they can both change, there are other reasons
why we should not be storing this "display"-style names anywhere,
including that PostgreSQL needs to store them in a place where the
encoding must be ASCII, which Türkiye is not.  And finally, the
Windows manual explicitly warns us about this[2]: "We don't recommend
this form for locale strings embedded in code or serialized to
storage: These strings are more likely to be changed by an operating
system update than the locale name form."

(Note: the BCP47 support in Windows did not exist when that PostgreSQL
code was written, so it did what it had to at the time.)

There are further problems to resolve:

1.  We don't know if we should put encodings (AKA codepages?) on the
end of those strings or not.  There is some confusion about what
exactly it does, and how it interacts with the "ACP".  I'd be worried
that if you don't put the endings on, perhaps it can change under your
feet.  (I suspect that part of the discussion on that other thread
took some wrong turns, based on citext_utf8 results that were actually
probably misleading.)
2.  If we do decide to put the encoding suffixes on, which encoding
should we be suggesting?  It appears from anecdotal reports that most
PostgreSQL-on-Windows users are stuck in the past, using the old
pre-UTF-8 language-specific encodings.  Should we be encouraging UTF-8
use by default, if we can?  Maybe that is a separate question.

Then there is the practical question of what to do with an
already-broken system.  One idea would be to introduce a "locale
remapping" file, pgdata/pg_locale.map where you can write things
like "Turkish_Turkey.1254"="Turkish_Türkiye.1254".

I think there might be a GUI tool that allows you to duplicate,
rename, etc locales in Windows, so you can re-create the old name.  I
believe that is how some people have fixed their broken databases.  I
don't know if there is a good reference/blog/article on that, that we
should be pointing people towards if they show up with broken systems.

Patches, testing, research are welcome!  Even though I put forward that
BCP47 idea, it was based on reading the manual, so the "unresolved"
questions may in fact be very easy to resolve by people who actually
use/know Windows.  Even if I had been gung-ho about committing
that in 16 without feedback from Windows users, it would have been too
late to help Turkish users with existing databases.

[1] https://learn.microsoft.com/en-us/globalization/locale/standard-locale-names
[2]
https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Hi Thomas,

Thanks for your reply. But I'm not sure if there is any miscommunication. Let me make it clear again.

I’m a program manager in the Windows App Assure ISV Outreach Team at Microsoft. We work with Microsoft’s test
organizationto notify developers when issues have been identified in their applications. The issue I reported in this
mailis an issue we found in our testing and I believe it also impact a lot of end users. Like you mentioned previously,
itis caused by recent operating system upgrade.
 

The solution I shared in my last email was suggested our engineers. It would be great is you could use _wsetlocale API.
thisis what the recommended way as par Microsoft documentation as well. (Reference: setlocale wsetlocale:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170

If you don’t want to change the API to update their mapping table with Turkish_Tur in case of Turkish_Türkiye as you
havedone for others locale with same issue as well in your code (reference PostgreSQL Source Code:
src/port/win32setlocale.cSource File: https://doxygen.postgresql.org/win32setlocale_8c_source.html, Line number
66-67).

Please let me know if there is any misunderstanding.

Thanks!
Haifang


-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 13, 2024 3:12 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 14, 2024 at 6:51 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by
thechanges made on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future?
 

First, let me restate the problem:

When you create a database cluster (= a PostgreSQL instance) with "initdb", unless you request a default locale with
--locale,initdb uses setlocale("") to query the system/user default locale.  It then records that string in
postgresql.conf,and also in the pg_database catalog.  On POSIX systems, that captures something like "tr-TR.UTF-8"
 
or similar.  On Windows, that captures something like "Turkish_Turkey.1254".  Later, PostgreSQL uses newlocale() or
setlocale() functions to access that locale again.  In rare cases where a country changes its name, a Windows update
*renames*the locale, and then those calls fail, because the old name is not recognised anymore.
 

I proposed a partial solution that should help avoid the problem in future:

I think that initdb should instead call GetUserDefaultLocaleName() to discover the user account's default locale,
becauseit returns BCP47 names like "tr-TR".  It is probably less likely for a country or language to change its ISO
code(but not impossible[1]), than for the English-language names of them to change.  Even if you reject this idea
becausetechnically they can both change, there are other reasons why we should not be storing this "display"-style
namesanywhere, including that PostgreSQL needs to store them in a place where the encoding must be ASCII, which Türkiye
isnot.  And finally, the Windows manual explicitly warns us about this[2]: "We don't recommend this form for locale
stringsembedded in code or serialized to
 
storage: These strings are more likely to be changed by an operating system update than the locale name form."

(Note: the BCP47 support in Windows did not exist when that PostgreSQL code was written, so it did what it had to at
thetime.)
 

There are further problems to resolve:

1.  We don't know if we should put encodings (AKA codepages?) on the end of those strings or not.  There is some
confusionabout what exactly it does, and how it interacts with the "ACP".  I'd be worried that if you don't put the
endingson, perhaps it can change under your feet.  (I suspect that part of the discussion on that other thread took
somewrong turns, based on citext_utf8 results that were actually probably misleading.) 2.  If we do decide to put the
encodingsuffixes on, which encoding should we be suggesting?  It appears from anecdotal reports that most
PostgreSQL-on-Windowsusers are stuck in the past, using the old
 
pre-UTF-8 language-specific encodings.  Should we be encouraging UTF-8 use by default, if we can?  Maybe that is a
separatequestion.
 

Then there is the practical question of what to do with an already-broken system.  One idea would be to introduce a
"localeremapping" file, pgdata/pg_locale.map where you can write things like
"Turkish_Turkey.1254"="Turkish_Türkiye.1254".

I think there might be a GUI tool that allows you to duplicate, rename, etc locales in Windows, so you can re-create
theold name.  I believe that is how some people have fixed their broken databases.  I don't know if there is a good
reference/blog/articleon that, that we should be pointing people towards if they show up with broken systems.
 

Patches, testing, research are welcome!  Even though I put forward that
BCP47 idea, it was based on reading the manual, so the "unresolved"
questions may in fact be very easy to resolve by people who actually use/know Windows.  Even if I had been gung-ho
aboutcommitting that in 16 without feedback from Windows users, it would have been too late to help Turkish users with
existingdatabases.
 

[1] https://learn.microsoft.com/en-us/globalization/locale/standard-locale-names
[2]
https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160

On Tue, May 14, 2024 at 10:27 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Thanks for your reply. But I'm not sure if there is any miscommunication. Let me make it clear again.

(Sorry, it seems our emails crossed.)

> I’m a program manager in the Windows App Assure ISV Outreach Team at Microsoft. We work with Microsoft’s test
organizationto notify developers when issues have been identified in their applications. The issue I reported in this
mailis an issue we found in our testing and I believe it also impact a lot of end users. Like you mentioned previously,
itis caused by recent operating system upgrade. 

Thanks for doing that, and yes, it affects a lot of users, and this is
not the first time.  It is still possible for it to be the last...

> The solution I shared in my last email was suggested our engineers. It would be great is you could use _wsetlocale
API.this is what the recommended way as par Microsoft documentation as well. (Reference: setlocale wsetlocale:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170

I don't understand.  Can you explain why _wsetlocale() is better than
setlocale()?  They behave identically except one takes wide
characters, which doesn't seem to solve any problem we have.

> If you don’t want to change the API to update their mapping table with Turkish_Tur in case of Turkish_Türkiye as you
havedone for others locale with same issue as well in your code (reference PostgreSQL Source Code:
src/port/win32setlocale.cSource File: https://doxygen.postgresql.org/win32setlocale_8c_source.html, Line number 66-67). 

Yeah, OK we could put more kludges into win32setlocale.c.  I don't
mind committing a patch like that if it addresses the issue.  I am not
in a position to confirm that myself... what we need is someone who
works with Windows to write the patch and test it across that upgrade.
Longer term I'm looking for something better than that though, because
it doesn't address the root cause (need for stable identifiers), and
will only ever allow us to fix problems with the old unstable names
*after* users complain that their database is dead, 3-6 months after
in fact due to release cycles.  I think a dynamic mapping file might
be better?  (Maybe win32locale.c should be able to read that kludge
table from a file that you can give it with an environment variable,
or something like that?)



Thomas Munro <thomas.munro@gmail.com> writes:
> Longer term I'm looking for something better than that though, because
> it doesn't address the root cause (need for stable identifiers), and
> will only ever allow us to fix problems with the old unstable names
> *after* users complain that their database is dead, 3-6 months after
> in fact due to release cycles.  I think a dynamic mapping file might
> be better?  (Maybe win32locale.c should be able to read that kludge
> table from a file that you can give it with an environment variable,
> or something like that?)

+1 for the long-term solution being more-stable locale identifiers.
However, we should try to build something that will let users get
out of these situations with the existing identifiers, so I like
your idea of a plain-text mapping file for Windows locale names.
I don't think an environment variable is necessary; just define
a fixed name "$PGDATA/locale_map.txt" or such.  If that file
exists, just read it and map the pg_database field values with it.

Maybe this shouldn't even be Windows-specific?  Are there any
cases where it'd save people's bacon on other platforms?

            regards, tom lane



Just curious. Why a plain text file rather than a system table? 

On Mon, May 13, 2024, 18:07 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
> Longer term I'm looking for something better than that though, because
> it doesn't address the root cause (need for stable identifiers), and
> will only ever allow us to fix problems with the old unstable names
> *after* users complain that their database is dead, 3-6 months after
> in fact due to release cycles.  I think a dynamic mapping file might
> be better?  (Maybe win32locale.c should be able to read that kludge
> table from a file that you can give it with an environment variable,
> or something like that?)

+1 for the long-term solution being more-stable locale identifiers.
However, we should try to build something that will let users get
out of these situations with the existing identifiers, so I like
your idea of a plain-text mapping file for Windows locale names.
I don't think an environment variable is necessary; just define
a fixed name "$PGDATA/locale_map.txt" or such.  If that file
exists, just read it and map the pg_database field values with it.

Maybe this shouldn't even be Windows-specific?  Are there any
cases where it'd save people's bacon on other platforms?

                        regards, tom lane


John McKown <john.archie.mckown@gmail.com> writes:
> Just curious. Why a plain text file rather than a system table?

Because you'd have no way to update such a table, if you can't
start the database or connect to it.  So that approach isn't
suitable for people whose database has been broken by one of
these system updates.  (This is largely the same reason why,
eg, postgresql.conf isn't a table.)

            regards, tom lane



On Tue, May 14, 2024 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> +1 for the long-term solution being more-stable locale identifiers.
> However, we should try to build something that will let users get
> out of these situations with the existing identifiers, so I like
> your idea of a plain-text mapping file for Windows locale names.
> I don't think an environment variable is necessary; just define
> a fixed name "$PGDATA/locale_map.txt" or such.  If that file
> exists, just read it and map the pg_database field values with it.

OK, I tried that, first draft attached (with my standard proviso that
I don't do Windows, I just know that this passes CI and that the code
works the way I intended on my local Unix system if extracted into a
little harness).  With this, you could in theory create a file
PGDATA/win32setlocale.map containing:

c Turkish_Turkey.1254=Turkish_Türkiye.1254

... or perhaps more likely:

c Turkish_Turkey.1254=tr-TR.1254

I also absorbed the pre-existing kludge table into the new system by
default (though they got a bit shorter 'cause I invented some
wildcards).  Some problems came up while wondering how to fit Türkiye
into the defaults, and how to back-patch:

1.  In the back-branches, we claim to support ancient Windows releases
as far back as "Windows 2000 SP4" (!), which obviously aren't getting
the Windows updates, so I guess "Turkish_Türkiye.1254" will fail there
and generally before Windows 10.  And even if you exclude the
extremities of our support window somehow (how?), modern systems might
not have applied the update yet (IIUC they *have* to at some point
under the new world order, so there is a defined window of version
skew these days).

2.  It's generally a terrible idea to be using "ü" in a locale name.
FWIW I assume setlocale() actually accepts and returns names encoded
in the current ACP ("active codepage", system-wide changeable setting
that controls char↔wchar_t conversion in system APIs), so the encoding
of that file (and the built-in default table) would need to match that
to work, as coded.  Perhaps it would be possible to make the mapping
file UTF-8 and transform that to ACP!  But it feels a bit too loopy
for me, and on the PostgreSQL side it is undefined/illegal whatever
you choose in PostgreSQL due to being accessed from different
databases which are using potentially different encodings that are
only required to be a superset of ASCII.  Avoid.

3.  Therefore you'd probably want to prefer "tr-TR.1254" as the
replacement string.  But what is the oldest Windows release that can
understand a BCP47 code like that?

4.  Conversely, on modern systems, I'm still not entirely sure that
"tr-TR.1254" is exactly the same thing as "Turkish-XXX.1254" and that
it's OK to put ".1254" on the end like that.  Is it, and is it?  I
don't mean just "does it mean Turkish?", I mean "does it give exactly
the same answer for every conceivable pair of strings when compared
with strcoll_l(), and likewise for the ctype-based functions like
towlower() et al".

If the answers are not in our favour, I guess we could leave the
default behaviour unchanged, and let people set up a text file as
shown above to fix their database if they want, but that's also not
very nice and kinda weird (helping hypothetical users of museum-grade
systems by leaving real users' systems broken).

If the answer to 4 is yes, yes then we could also push ahead with the
plan to make initdb pick BCP47 names by default in PG18 (or even 17).

> Maybe this shouldn't even be Windows-specific?  Are there any
> cases where it'd save people's bacon on other platforms?

Good question.  Sometimes ISO code go away or countries split etc, so
it's no like POSIX locale names are set in stone under all
circumstances.  But on Unixen it's all just files in practice, you can
always just symlink them, move them around, compile them yourself from
sources, etc, if you really have to, so I think I'd rather contain the
crazy in win32*.c.

Attachment

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Thanks for your questions, Thomas and Tom. + @Vishwa to help with technical questions.

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Monday, May 13, 2024 11:38 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 14, 2024 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> +1 for the long-term solution being more-stable locale identifiers.
> However, we should try to build something that will let users get out 
> of these situations with the existing identifiers, so I like your idea 
> of a plain-text mapping file for Windows locale names.
> I don't think an environment variable is necessary; just define a 
> fixed name "$PGDATA/locale_map.txt" or such.  If that file exists, 
> just read it and map the pg_database field values with it.

OK, I tried that, first draft attached (with my standard proviso that I don't do Windows, I just know that this passes
CIand that the code works the way I intended on my local Unix system if extracted into a little harness).  With this,
youcould in theory create a file PGDATA/win32setlocale.map containing:
 

c Turkish_Turkey.1254=Turkish_Türkiye.1254

... or perhaps more likely:

c Turkish_Turkey.1254=tr-TR.1254

I also absorbed the pre-existing kludge table into the new system by default (though they got a bit shorter 'cause I
inventedsome wildcards).  Some problems came up while wondering how to fit Türkiye into the defaults, and how to
back-patch:

1.  In the back-branches, we claim to support ancient Windows releases as far back as "Windows 2000 SP4" (!), which
obviouslyaren't getting the Windows updates, so I guess "Turkish_Türkiye.1254" will fail there and generally before
Windows10.  And even if you exclude the extremities of our support window somehow (how?), modern systems might not have
appliedthe update yet (IIUC they *have* to at some point under the new world order, so there is a defined window of
versionskew these days).
 

2.  It's generally a terrible idea to be using "ü" in a locale name.
FWIW I assume setlocale() actually accepts and returns names encoded in the current ACP ("active codepage", system-wide
changeablesetting that controls char↔wchar_t conversion in system APIs), so the encoding of that file (and the built-in
defaulttable) would need to match that to work, as coded.  Perhaps it would be possible to make the mapping file UTF-8
andtransform that to ACP!  But it feels a bit too loopy for me, and on the PostgreSQL side it is undefined/illegal
whateveryou choose in PostgreSQL due to being accessed from different databases which are using potentially different
encodingsthat are only required to be a superset of ASCII.  Avoid.
 

3.  Therefore you'd probably want to prefer "tr-TR.1254" as the replacement string.  But what is the oldest Windows
releasethat can understand a BCP47 code like that?
 

4.  Conversely, on modern systems, I'm still not entirely sure that "tr-TR.1254" is exactly the same thing as
"Turkish-XXX.1254"and that it's OK to put ".1254" on the end like that.  Is it, and is it?  I don't mean just "does it
meanTurkish?", I mean "does it give exactly the same answer for every conceivable pair of strings when compared with
strcoll_l(),and likewise for the ctype-based functions like
 
towlower() et al".

If the answers are not in our favour, I guess we could leave the default behaviour unchanged, and let people set up a
textfile as shown above to fix their database if they want, but that's also not very nice and kinda weird (helping
hypotheticalusers of museum-grade systems by leaving real users' systems broken).
 

If the answer to 4 is yes, yes then we could also push ahead with the plan to make initdb pick BCP47 names by default
inPG18 (or even 17).
 

> Maybe this shouldn't even be Windows-specific?  Are there any cases 
> where it'd save people's bacon on other platforms?

Good question.  Sometimes ISO code go away or countries split etc, so it's no like POSIX locale names are set in stone
underall circumstances.  But on Unixen it's all just files in practice, you can always just symlink them, move them
around,compile them yourself from sources, etc, if you really have to, so I think I'd rather contain the crazy in
win32*.c.

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Hi Thomas and Tom,

Please let us know your question or concern about the bug we discuss in this mail.

Thanks!
Haifang 

-----Original Message-----
From: Haifang Wang (Centific Technologies Inc) 
Sent: Wednesday, May 15, 2024 4:47 PM
To: Thomas Munro <thomas.munro@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

Thanks for your questions, Thomas and Tom. + @Vishwa to help with technical questions.

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 13, 2024 11:38 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 14, 2024 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> +1 for the long-term solution being more-stable locale identifiers.
> However, we should try to build something that will let users get out 
> of these situations with the existing identifiers, so I like your idea 
> of a plain-text mapping file for Windows locale names.
> I don't think an environment variable is necessary; just define a 
> fixed name "$PGDATA/locale_map.txt" or such.  If that file exists, 
> just read it and map the pg_database field values with it.

OK, I tried that, first draft attached (with my standard proviso that I don't do Windows, I just know that this passes
CIand that the code works the way I intended on my local Unix system if extracted into a little harness).  With this,
youcould in theory create a file PGDATA/win32setlocale.map containing:
 

c Turkish_Turkey.1254=Turkish_Türkiye.1254

... or perhaps more likely:

c Turkish_Turkey.1254=tr-TR.1254

I also absorbed the pre-existing kludge table into the new system by default (though they got a bit shorter 'cause I
inventedsome wildcards).  Some problems came up while wondering how to fit Türkiye into the defaults, and how to
back-patch:

1.  In the back-branches, we claim to support ancient Windows releases as far back as "Windows 2000 SP4" (!), which
obviouslyaren't getting the Windows updates, so I guess "Turkish_Türkiye.1254" will fail there and generally before
Windows10.  And even if you exclude the extremities of our support window somehow (how?), modern systems might not have
appliedthe update yet (IIUC they *have* to at some point under the new world order, so there is a defined window of
versionskew these days).
 

2.  It's generally a terrible idea to be using "ü" in a locale name.
FWIW I assume setlocale() actually accepts and returns names encoded in the current ACP ("active codepage", system-wide
changeablesetting that controls char↔wchar_t conversion in system APIs), so the encoding of that file (and the built-in
defaulttable) would need to match that to work, as coded.  Perhaps it would be possible to make the mapping file UTF-8
andtransform that to ACP!  But it feels a bit too loopy for me, and on the PostgreSQL side it is undefined/illegal
whateveryou choose in PostgreSQL due to being accessed from different databases which are using potentially different
encodingsthat are only required to be a superset of ASCII.  Avoid.
 

3.  Therefore you'd probably want to prefer "tr-TR.1254" as the replacement string.  But what is the oldest Windows
releasethat can understand a BCP47 code like that?
 

4.  Conversely, on modern systems, I'm still not entirely sure that "tr-TR.1254" is exactly the same thing as
"Turkish-XXX.1254"and that it's OK to put ".1254" on the end like that.  Is it, and is it?  I don't mean just "does it
meanTurkish?", I mean "does it give exactly the same answer for every conceivable pair of strings when compared with
strcoll_l(),and likewise for the ctype-based functions like
 
towlower() et al".

If the answers are not in our favour, I guess we could leave the default behaviour unchanged, and let people set up a
textfile as shown above to fix their database if they want, but that's also not very nice and kinda weird (helping
hypotheticalusers of museum-grade systems by leaving real users' systems broken).
 

If the answer to 4 is yes, yes then we could also push ahead with the plan to make initdb pick BCP47 names by default
inPG18 (or even 17).
 

> Maybe this shouldn't even be Windows-specific?  Are there any cases 
> where it'd save people's bacon on other platforms?

Good question.  Sometimes ISO code go away or countries split etc, so it's no like POSIX locale names are set in stone
underall circumstances.  But on Unixen it's all just files in practice, you can always just symlink them, move them
around,compile them yourself from sources, etc, if you really have to, so I think I'd rather contain the crazy in
win32*.c.

On Sat, May 18, 2024 at 10:25 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Please let us know your question or concern about the bug we discuss in this mail.

Thanks for your help with this!  Did I miss an email in between?  Here
are some unanswered questions that seem to be stopping progress:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?
2.  If we translate to BCP47 automatically, should we put the ".1452"
on the end?  What does it mean exactly?  What does it mean if you
don't put it there?
3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?

With answers to those questions we might be able to ship some nice
built-in translations to get users out of this jam.  If there are
issues on those points, we might have to face some questions about
what the encoding is of the "Turkish_Türkiye.1254" string itself,
which is tricky for us for technical reasons, among other problems...
not sure.



Thomas Munro <thomas.munro@gmail.com> writes:
> With answers to those questions we might be able to ship some nice
> built-in translations to get users out of this jam.  If there are
> issues on those points, we might have to face some questions about
> what the encoding is of the "Turkish_Türkiye.1254" string itself,
> which is tricky for us for technical reasons, among other problems...
> not sure.

TBH, my idea of how this should go was to *not* ship any built-in
translations, or indeed any translation file at all (so I didn't
like your moving some existing hacks into that file).  If we
approach it like that, then individual users who hit this problem
are responsible for creating their own translation file, which
will automatically use whatever is the locally-appropriate
encoding.  Sure, it's less transparent for affected users, but
it will work for them which other approaches might not; and
evidence so far is that there's not a huge number of affected
users.

            regards, tom lane



On Sat, May 18, 2024 at 5:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > With answers to those questions we might be able to ship some nice
> > built-in translations to get users out of this jam.  If there are
> > issues on those points, we might have to face some questions about
> > what the encoding is of the "Turkish_Türkiye.1254" string itself,
> > which is tricky for us for technical reasons, among other problems...
> > not sure.
>
> TBH, my idea of how this should go was to *not* ship any built-in
> translations, or indeed any translation file at all (so I didn't
> like your moving some existing hacks into that file).  If we
> approach it like that, then individual users who hit this problem
> are responsible for creating their own translation file, which
> will automatically use whatever is the locally-appropriate
> encoding.  Sure, it's less transparent for affected users, but
> it will work for them which other approaches might not; and
> evidence so far is that there's not a huge number of affected
> users.

Hmm, yeah I guess we could just ship a patch like what I posted
already, and let users figure out what works.  I would still like to
know the answer to those questions so we can offer good advice.  If
the answers are all yes then I think we can say "the file is in
encoded in ASCII; use wildcards to deal with any legacy non-ASCII
locale names on the left, and use BCP47 names on the right" and begin
expunging the bad old names from our universe.



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Hi Thomas,

What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

NOTE: this email is plain text format, not easy to read. And also the previous email content usually is not included in
thelatest email, we only could see the most latest one. 
 

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Friday, May 17, 2024 10:57 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>;
pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Sat, May 18, 2024 at 5:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > With answers to those questions we might be able to ship some nice 
> > built-in translations to get users out of this jam.  If there are 
> > issues on those points, we might have to face some questions about 
> > what the encoding is of the "Turkish_Türkiye.1254" string itself, 
> > which is tricky for us for technical reasons, among other problems...
> > not sure.
>
> TBH, my idea of how this should go was to *not* ship any built-in 
> translations, or indeed any translation file at all (so I didn't like 
> your moving some existing hacks into that file).  If we approach it 
> like that, then individual users who hit this problem are responsible 
> for creating their own translation file, which will automatically use 
> whatever is the locally-appropriate encoding.  Sure, it's less 
> transparent for affected users, but it will work for them which other 
> approaches might not; and evidence so far is that there's not a huge 
> number of affected users.

Hmm, yeah I guess we could just ship a patch like what I posted already, and let users figure out what works.  I would
stilllike to know the answer to those questions so we can offer good advice.  If the answers are all yes then I think
wecan say "the file is in encoded in ASCII; use wildcards to deal with any legacy non-ASCII locale names on the left,
anduse BCP47 names on the right" and begin expunging the bad old names from our universe.
 

On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

I already did, twice, but perhaps Vishwa or others can't see the whole
thread, so here is this whole thread in our project email archive:


https://www.postgresql.org/message-id/flat/PH8PR21MB3902F334A3174C54058F792CE5182%40PH8PR21MB3902.namprd21.prod.outlook.com

But let me ask the questions again, with some motivation/reason I want
to know in parentheses:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL
versions, for example PostgreSQL 12, are expected to run on old
versions of Windows from long before Windows 10, so we might have to
consider this.  However, if we go with Tom's idea that we do nothing
by default but just allow users to supply their own optional mapping
file, then this question becomes unimportant, users can figure out for
themselves whether it works, and presumably only 10+ got the update
that renamed Turkey to Türkiye.  [And in reality, I hope/expect that
no one really does run old out-of-support OSes, because that's crazy,
but I'm not allowed to assume...])

2.  If we translate to BCP47 locale names like "tr-TR" automatically,
should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if
you don't put it on, the encoding in "char"-based functions is the
"ACP".  What I really want to know is, can it be different from the
"ACP", and if it is, which functions does it affect?  For example if
the ACP is 1521 and I call _tolower_l() giving it a locale_t that I
opened with "en-US.UTF-8", what happens?  I am sure this is a simple
question but we are not Windows programmers, you are the first person
to show up offering to investigate, and I personally found the docs a
bit light on the topic.)

3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In
other words, is it *exactly the same code and driving data*, just
using different labels?  Or is it a new locale implementation that
could differ arbitrarily in behaviour?  If the answer is yes, it's
just a new naming scheme, then life will be much much simpler for our
users, but if not, then indexes might be corrupted if we tell people
to switch to the new BCP47 names, and so we'd better know about that,
so we can adjust our advice to users.)



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Hi Vishwa,

Could you please help with the questions below?

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our
projectemail archive:
 


https://www.postgresql.org/message-id/flat/PH8PR21MB3902F334A3174C54058F792CE5182%40PH8PR21MB3902.namprd21.prod.outlook.com

But let me ask the questions again, with some motivation/reason I want to know in parentheses:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected
torun on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go
withTom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this
questionbecomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the
updatethat renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support
OSes,because that's crazy, but I'm not allowed to assume...])
 

2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does
itmean exactly?
 
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based
functionsis the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which
functionsdoes it affect?  For example if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened
with"en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the
firstperson to show up offering to investigate, and I personally found the docs a bit light on the topic.)
 

3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and
drivingdata*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in
behaviour? If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but
ifnot, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know
aboutthat, so we can adjust our advice to users.)
 


I am adding experts in this thread to address these queries. @Amy Wishnousky @Shawn Steele @Rahul Pandey
Adding my inline comment to best of my knowledge.


Thanks & Regards
Vishwa


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, May 21, 2024 5:02 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
 
Hi Vishwa,

Could you please help with the questions below?

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPH8PR21MB3902F334A3174C54058F792CE5182%2540PH8PR21MB3902.namprd21.prod.outlook.com&data=05%7C02%7CVishwa.Deepak%40microsoft.com%7Cd5972d3ecb084308e18d08dc79252cea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638518447836539330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ELfVuQ62kphneTt0ES5v2iPOhlXH2OtxU0EGBre6BG8%3D&reserved=0

But let me ask the questions again, with some motivation/reason I want to know in parentheses:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.



2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details
The previous topic (How the Resource Management System matches and chooses resources) looks at qualifier-matching in general. This topic focuses on language-tag-matching in more detail.
learn.microsoft.com



3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)
Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  



Please do proper due diligence at your end before proceeding with any kind of mapping.

Regards     
Vishwa


Thanks, Vishwa, for tagging me. 
Adding my two cents,

Hello Thomas,

1.  What is the oldest Windows release that can understand the "new" BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Vishwa: Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.

Rahul: Locale names based on BCP 47 were first introduced in Windows Vista timeframe, so using them should be pretty safe for most modern and older versions (unless its XP or earlier). 


2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

Vishwa: It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details

Rahul: For the first part, your assumption is correct. The behaviour for "tr-TR" and "tr-TR.ACP" would be same and it would try to use the default ANSI Code Page for Turkish (which happens to be 1254). Using any other code page (for example"tr-TR.1252") would use that codepage (1252: English). For the second part (example), I am not sure if I understand the question completely, but mixing the encoding is almost never a good idea and could lead to mojibaked strings in the worst case to no change (if strings only contain ASCII chars) in the best-case scenario.

3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)

Vishwa: Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  

Rahul: I agree with Vishwa. the locale is the same, just the name of the country in English is changed. Rest all data is the same.

Thanks,
Rahul




From: Vishwa Deepak <Vishwa.Deepak@microsoft.com>
Sent: Tuesday, May 21, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Rahul Pandey <pandeyrah@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
 

I am adding experts in this thread to address these queries. @Amy Wishnousky @Shawn Steele @Rahul Pandey
Adding my inline comment to best of my knowledge.


Thanks & Regards
Vishwa


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, May 21, 2024 5:02 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
 
Hi Vishwa,

Could you please help with the questions below?

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPH8PR21MB3902F334A3174C54058F792CE5182%2540PH8PR21MB3902.namprd21.prod.outlook.com&data=05%7C02%7CVishwa.Deepak%40microsoft.com%7Cd5972d3ecb084308e18d08dc79252cea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638518447836539330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ELfVuQ62kphneTt0ES5v2iPOhlXH2OtxU0EGBre6BG8%3D&reserved=0

But let me ask the questions again, with some motivation/reason I want to know in parentheses:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.



2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details
The previous topic (How the Resource Management System matches and chooses resources) looks at qualifier-matching in general. This topic focuses on language-tag-matching in more detail.
learn.microsoft.com



3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)
Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  



Please do proper due diligence at your end before proceeding with any kind of mapping.

Regards     
Vishwa


RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:

Thanks Vishwa for all the clarification below.

 

Hi @Rahul and everyone,

 

Is there anything else not clear? Is there any solution for the issue?

 

Thanks!
Haifang

 

From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, May 22, 2024 4:37 AM
To: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Thanks, Vishwa, for tagging me. 

Adding my two cents,

 

Hello Thomas,

 

1.  What is the oldest Windows release that can understand the "new" BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Vishwa: Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.

May be added expert from feature team can validate this assumption.

 

Rahul: Locale names based on BCP 47 were first introduced in Windows Vista timeframe, so using them should be pretty safe for most modern and older versions (unless its XP or earlier). 

 

 

2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

 

Vishwa: It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details

https://learn.microsoft.com/en-us/windows/uwp/app-resources/how-rms-matches-lang-tags

 

Rahul: For the first part, your assumption is correct. The behaviour for "tr-TR" and "tr-TR.ACP" would be same and it would try to use the default ANSI Code Page for Turkish (which happens to be 1254). Using any other code page (for example"tr-TR.1252") would use that codepage (1252: English). For the second part (example), I am not sure if I understand the question completely, but mixing the encoding is almost never a good idea and could lead to mojibaked strings in the worst case to no change (if strings only contain ASCII chars) in the best-case scenario.

 

3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)

 

Vishwa: Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  

 

Rahul: I agree with Vishwa. the locale is the same, just the name of the country in English is changed. Rest all data is the same.

 

Thanks,

Rahul

 

 


From: Vishwa Deepak <Vishwa.Deepak@microsoft.com>
Sent: Tuesday, May 21, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Rahul Pandey <pandeyrah@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

 

I am adding experts in this thread to address these queries. @Amy Wishnousky @Shawn Steele @Rahul Pandey

Adding my inline comment to best of my knowledge.

 

 

Thanks & Regards
Vishwa

 


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, May 21, 2024 5:02 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi Vishwa,

Could you please help with the questions below?

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 21, 2024 at 8:17
AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPH8PR21MB3902F334A3174C54058F792CE5182%2540PH8PR21MB3902.namprd21.prod.outlook.com&data=05%7C02%7CVishwa.Deepak%40microsoft.com%7Cd5972d3ecb084308e18d08dc79252cea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638518447836539330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ELfVuQ62kphneTt0ES5v2iPOhlXH2OtxU0EGBre6BG8%3D&reserved=0

But let me ask the questions again, with some motivation/reason I want to know in parentheses:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.



2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

 

It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details

The previous topic (How the Resource Management System matches and chooses resources) looks at qualifier-matching in general. This topic focuses on language-tag-matching in more detail.

learn.microsoft.com

 



3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)

Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  



Please do proper due diligence at your end before proceeding with any kind of mapping.

Regards     

Vishwa

 

 

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:

Hi Thomas and Team,

 

Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?

 

Thanks!
Haifang

 

 

From: Haifang Wang (Centific Technologies Inc)
Sent: Tuesday, May 28, 2024 11:21 AM
To: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Thanks Vishwa for all the clarification below.

 

Hi @Rahul and everyone,

 

Is there anything else not clear? Is there any solution for the issue?

 

Thanks!
Haifang

 

From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, May 22, 2024 4:37 AM
To: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Thanks, Vishwa, for tagging me. 

Adding my two cents,

 

Hello Thomas,

 

1.  What is the oldest Windows release that can understand the "new" BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Vishwa: Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.

May be added expert from feature team can validate this assumption.

 

Rahul: Locale names based on BCP 47 were first introduced in Windows Vista timeframe, so using them should be pretty safe for most modern and older versions (unless its XP or earlier). 

 

 

2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

 

Vishwa: It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details

https://learn.microsoft.com/en-us/windows/uwp/app-resources/how-rms-matches-lang-tags

 

Rahul: For the first part, your assumption is correct. The behaviour for "tr-TR" and "tr-TR.ACP" would be same and it would try to use the default ANSI Code Page for Turkish (which happens to be 1254). Using any other code page (for example"tr-TR.1252") would use that codepage (1252: English). For the second part (example), I am not sure if I understand the question completely, but mixing the encoding is almost never a good idea and could lead to mojibaked strings in the worst case to no change (if strings only contain ASCII chars) in the best-case scenario.

 

3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)

 

Vishwa: Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  

 

Rahul: I agree with Vishwa. the locale is the same, just the name of the country in English is changed. Rest all data is the same.

 

Thanks,

Rahul

 

 


From: Vishwa Deepak <Vishwa.Deepak@microsoft.com>
Sent: Tuesday, May 21, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Rahul Pandey <pandeyrah@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

 

I am adding experts in this thread to address these queries. @Amy Wishnousky @Shawn Steele @Rahul Pandey

Adding my inline comment to best of my knowledge.

 

 

Thanks & Regards
Vishwa

 


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, May 21, 2024 5:02 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi Vishwa,

Could you please help with the questions below?

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 21, 2024 at 8:17
AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPH8PR21MB3902F334A3174C54058F792CE5182%2540PH8PR21MB3902.namprd21.prod.outlook.com&data=05%7C02%7CVishwa.Deepak%40microsoft.com%7Cd5972d3ecb084308e18d08dc79252cea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638518447836539330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ELfVuQ62kphneTt0ES5v2iPOhlXH2OtxU0EGBre6BG8%3D&reserved=0

But let me ask the questions again, with some motivation/reason I want to know in parentheses:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.



2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

 

It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details

The previous topic (How the Resource Management System matches and chooses resources) looks at qualifier-matching in general. This topic focuses on language-tag-matching in more detail.

learn.microsoft.com

 



3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)

Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  

 

Please do proper due diligence at your end before proceeding with any kind of mapping.

Regards     

Vishwa

 

 

On Tue, Jun 4, 2024 at 1:00 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below.
Didyou get chance to do investigations? Any other questions? 

Hi,

Thanks for all your feedback!  Sorry for my late replies, I was
travelling and at a conference last week (at which one of the topics
was "what are we going to do about the sorry state of PostgreSQL on
Windows", but that's a wider topic than this thread...)

The answers are all useful, thank you.  There was just one thing I
wanted to clarify.  Vishwa said:

"Yes, its exactly the same code and driving data , only spelling is
changed which included replcement of u with ü"

That sounds like it is answering the question 'Do
"Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?',
which I had already assumed to be true, but I was actually asking 'Do
""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all
behave identically?'.  It sounds like the answer is probably yes, but
the mention of "u" vs "ü" implied a narrower answer than I was looking
for...  It's important, because we're proposing to translate to
"tr-TR.1254".

Since no one has come forward to test the patch I wrote on Windows, I
think my next move will be to try to make a build option that can also
do locale name renaming on Unix, so that I have something that I could
test myself and push for the next release of PostgreSQL which will be
in October.

One open question is whether we should ship a translation file
ourselves.  I have heard two opinions: Tom Lane proposed in this
thread that there should be no file initially, we let the user figure
out what to put in there.  Magnus Hagander proposed at the pgconf.dev
conference last week that we should ideally ship a complete
translation file and maintain it over time, and it should be installed
not in pgdata but rather in the install directory.  I am not
personally able or willing to maintain such a file, I'm only offering
to supply the C code to read it and do what it says, but perhaps a
group such as the EDB Installer maintainer group, or at least someone
with a vested interest in PostgreSQL on Windows, might like to own
that job.



Thomas Munro <thomas.munro@gmail.com> writes:
> One open question is whether we should ship a translation file
> ourselves.  I have heard two opinions: Tom Lane proposed in this
> thread that there should be no file initially, we let the user figure
> out what to put in there.  Magnus Hagander proposed at the pgconf.dev
> conference last week that we should ideally ship a complete
> translation file and maintain it over time, and it should be installed
> not in pgdata but rather in the install directory.

FWIW, I'm kind of down on the latter approach, because I don't think
it'll move the needle very far.  Based on track record so far, there's
no chance that we will be aware of a Microsoft locale renaming before
it starts breaking users' databases.  Therefore, "edit the translation
file" is going to have to be a documented process in any case, because
affected users are not going to want to wait around for our next
release for a fix.  Also, if people do have to do that, it doesn't
seem like a great idea to tell them to modify an installed file rather
than a cluster-local configuration file.  What if they do a minor
version update but the minor version doesn't (yet) contain the fix?

Admittedly, the installed-file approach could make it more transparent
for people who'd done a PG minor update before the relevant Windows
update.  I'm not sure how large that set of people will be, though.

            regards, tom lane



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer.

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele
<Shawn.Steele@microsoft.com>;Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>;
pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Jun 4, 2024 at 1:00 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below.
Didyou get chance to do investigations? Any other questions?
 

Hi,

Thanks for all your feedback!  Sorry for my late replies, I was travelling and at a conference last week (at which one
ofthe topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic
thanthis thread...)
 

The answers are all useful, thank you.  There was just one thing I wanted to clarify.  Vishwa said:

"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"

That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave
identically?',which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254",
"Turkish_Türkiye.1254"and "tr-TR.1254" all behave identically?'.  It sounds like the answer is probably yes, but the
mentionof "u" vs "ü" implied a narrower answer than I was looking for...  It's important, because we're proposing to
translateto "tr-TR.1254".
 

Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build
optionthat can also do locale name renaming on Unix, so that I have something that I could test myself and push for the
nextrelease of PostgreSQL which will be in October.
 

One open question is whether we should ship a translation file ourselves.  I have heard two opinions: Tom Lane proposed
inthis thread that there should be no file initially, we let the user figure out what to put in there.  Magnus Hagander
proposedat the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it
overtime, and it should be installed not in pgdata but rather in the install directory.  I am not personally able or
willingto maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a
groupsuch as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows,
mightlike to own that job.
 

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Thanks, Tom. Is there any suggestion on how to track all the conversations in one thread? Seems like all our previous
discussionare in different mail thread. It is not easy to track. 😊
 

Regards!
Haifang

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us> 
Sent: Monday, June 3, 2024 3:55 PM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Rahul Pandey <pandeyrah@microsoft.com>; Vishwa
Deepak<Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>;
pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
MagnusHagander <magnus@hagander.net>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

Thomas Munro <thomas.munro@gmail.com> writes:
> One open question is whether we should ship a translation file 
> ourselves.  I have heard two opinions: Tom Lane proposed in this 
> thread that there should be no file initially, we let the user figure 
> out what to put in there.  Magnus Hagander proposed at the pgconf.dev 
> conference last week that we should ideally ship a complete 
> translation file and maintain it over time, and it should be installed 
> not in pgdata but rather in the install directory.

FWIW, I'm kind of down on the latter approach, because I don't think it'll move the needle very far.  Based on track
recordso far, there's no chance that we will be aware of a Microsoft locale renaming before it starts breaking users'
databases. Therefore, "edit the translation file" is going to have to be a documented process in any case, because
affectedusers are not going to want to wait around for our next release for a fix.  Also, if people do have to do that,
itdoesn't seem like a great idea to tell them to modify an installed file rather than a cluster-local configuration
file. What if they do a minor version update but the minor version doesn't (yet) contain the fix?
 

Admittedly, the installed-file approach could make it more transparent for people who'd done a PG minor update before
therelevant Windows update.  I'm not sure how large that set of people will be, though.
 

            regards, tom lane

On Tue, Jun 4, 2024 at 12:13 PM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Thanks, Tom. Is there any suggestion on how to track all the conversations in one thread? Seems like all our previous
discussionare in different mail thread. It is not easy to track. 😊 

FWIW this shows up as one thread in my email client, and in the
PostgreSQL archives website[1].  I don't know much about email and
which RFCs or conventions are at work here, but apparently different
clients are using different techniques to identify threads.  I assume
it could be done with thread headers (as seen in this thread) or
reply-to chains or fuzzy recognition of subject etc.  Given that
Outlook seems to ignore the "> " inline response quoting convention
used by the rest of the internet, it wouldn't surprise me to hear that
it also doesn't follow the conventions for recognising threads either,
being a sort of related topic.

[1]
https://www.postgresql.org/message-id/flat/PH8PR21MB3902F334A3174C54058F792CE5182%40PH8PR21MB3902.namprd21.prod.outlook.com



Hi,

"Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?"
You are correct, they all behave in exactly the same way. 
I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the implementation more future proof.

As for translation files, apologies I neither have a preference nor expertise to comment on this.


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, June 4, 2024 5:39 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
 
Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer.

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Jun 4, 2024 at 1:00 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?

Hi,

Thanks for all your feedback!  Sorry for my late replies, I was travelling and at a conference last week (at which one of the topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic than this thread...)

The answers are all useful, thank you.  There was just one thing I wanted to clarify.  Vishwa said:

"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"

That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?', which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?'.  It sounds like the answer is probably yes, but the mention of "u" vs "ü" implied a narrower answer than I was looking for...  It's important, because we're proposing to translate to "tr-TR.1254".

Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build option that can also do locale name renaming on Unix, so that I have something that I could test myself and push for the next release of PostgreSQL which will be in October.

One open question is whether we should ship a translation file ourselves.  I have heard two opinions: Tom Lane proposed in this thread that there should be no file initially, we let the user figure out what to put in there.  Magnus Hagander proposed at the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it over time, and it should be installed not in pgdata but rather in the install directory.  I am not personally able or willing to maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a group such as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, might like to own that job.
On Wed, Jun 5, 2024 at 9:10 PM Rahul Pandey <pandeyrah@microsoft.com> wrote:
> "Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?"
> You are correct, they all behave in exactly the same way.
> I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the
implementationmore future proof. 

Thanks for confirming.  More soon.



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:

Hi Thomas and Tom,

 

Just would like to follow up you about the bug. Is there any update? Any other information needed?

 

Thanks!
Haifang

 

From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, June 5, 2024 2:11 AM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
🎯dev <targetdev@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi,

 

"Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?"

You are correct, they all behave in exactly the same way. 

I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the implementation more future proof.

 

As for translation files, apologies I neither have a preference nor expertise to comment on this.

 


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, June 4, 2024 5:39 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer.

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Jun 4, 2024 at 1:00
AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?

Hi,

Thanks for all your feedback!  Sorry for my late replies, I was travelling and at a conference last week (at which one of the topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic than this thread...)

The answers are all useful, thank you.  There was just one thing I wanted to clarify.  Vishwa said:

"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"

That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?', which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?'.  It sounds like the answer is probably yes, but the mention of "u" vs "ü" implied a narrower answer than I was looking for...  It's important, because we're proposing to translate to "tr-TR.1254".

Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build option that can also do locale name renaming on Unix, so that I have something that I could test myself and push for the next release of PostgreSQL which will be in October.

One open question is whether we should ship a translation file ourselves.  I have heard two opinions: Tom Lane proposed in this thread that there should be no file initially, we let the user figure out what to put in there.  Magnus Hagander proposed at the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it over time, and it should be installed not in pgdata but rather in the install directory.  I am not personally able or willing to maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a group such as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, might like to own that job.

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:

Hi all,

 

Still need to double check with you about this bug. How is it going? Any other information needed?

 

Thanks!
Haifang

 

From: Haifang Wang (Centific Technologies Inc)
Sent: Thursday, June 13, 2024 11:47 AM
To: Rahul Pandey <pandeyrah@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
🎯dev <targetdev@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi Thomas and Tom,

 

Just would like to follow up you about the bug. Is there any update? Any other information needed?

 

Thanks!
Haifang

 

From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, June 5, 2024 2:11 AM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
🎯dev <targetdev@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi,

 

"Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?"

You are correct, they all behave in exactly the same way. 

I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the implementation more future proof.

 

As for translation files, apologies I neither have a preference nor expertise to comment on this.

 


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, June 4, 2024 5:39 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer.

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Jun 4, 2024 at 1:00
AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?

Hi,

Thanks for all your feedback!  Sorry for my late replies, I was travelling and at a conference last week (at which one of the topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic than this thread...)

The answers are all useful, thank you.  There was just one thing I wanted to clarify.  Vishwa said:

"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"

That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?', which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?'.  It sounds like the answer is probably yes, but the mention of "u" vs "ü" implied a narrower answer than I was looking for...  It's important, because we're proposing to translate to "tr-TR.1254".

Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build option that can also do locale name renaming on Unix, so that I have something that I could test myself and push for the next release of PostgreSQL which will be in October.

One open question is whether we should ship a translation file ourselves.  I have heard two opinions: Tom Lane proposed in this thread that there should be no file initially, we let the user figure out what to put in there.  Magnus Hagander proposed at the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it over time, and it should be installed not in pgdata but rather in the install directory.  I am not personally able or willing to maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a group such as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, might like to own that job.

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:

 

Hi all,

 

Still need to double check with you about this bug. How is it going? Any other information needed?

 

Thanks!
Haifang

 

 

From: Haifang Wang (Centific Technologies Inc)
Sent: Monday, June 17, 2024 2:06 PM
To: Rahul Pandey <pandeyrah@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
🎯dev <targetdev@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi all,

 

Still need to double check with you about this bug. How is it going? Any other information needed?

 

Thanks!
Haifang

 

From: Haifang Wang (Centific Technologies Inc)
Sent: Thursday, June 13, 2024 11:47 AM
To: Rahul Pandey <pandeyrah@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
🎯dev <targetdev@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi Thomas and Tom,

 

Just would like to follow up you about the bug. Is there any update? Any other information needed?

 

Thanks!
Haifang

 

From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, June 5, 2024 2:11 AM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
🎯dev <targetdev@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi,

 

"Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?"

You are correct, they all behave in exactly the same way. 

I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the implementation more future proof.

 

As for translation files, apologies I neither have a preference nor expertise to comment on this.

 


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, June 4, 2024 5:39 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer.

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Jun 4, 2024 at 1:00
AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?

Hi,

Thanks for all your feedback!  Sorry for my late replies, I was travelling and at a conference last week (at which one of the topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic than this thread...)

The answers are all useful, thank you.  There was just one thing I wanted to clarify.  Vishwa said:

"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"

That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?', which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?'.  It sounds like the answer is probably yes, but the mention of "u" vs "ü" implied a narrower answer than I was looking for...  It's important, because we're proposing to translate to "tr-TR.1254".

Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build option that can also do locale name renaming on Unix, so that I have something that I could test myself and push for the next release of PostgreSQL which will be in October.

One open question is whether we should ship a translation file ourselves.  I have heard two opinions: Tom Lane proposed in this thread that there should be no file initially, we let the user figure out what to put in there.  Magnus Hagander proposed at the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it over time, and it should be installed not in pgdata but rather in the install directory.  I am not personally able or willing to maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a group such as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, might like to own that job.

On Tue, Jun 4, 2024 at 9:22 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Since no one has come forward to test the patch I wrote on Windows, I
> think my next move will be to try to make a build option that can also
> do locale name renaming on Unix, so that I have something that I could
> test myself and push for the next release of PostgreSQL which will be
> in October.

Here is a such a patch.  If you go into pg_config_manual.h and
uncomment this line:

/* #define DEBUG_SETLOCALE_MAP */

... then Unix systems will also be able to rename locales passed to
setlocale().  A map file can be provided either by putting its
absolute path into the environment variable PG_SETLOCALE_MAP, or by
installing it as $PREFIX/share/postgresql/setlocale.map.  I couldn't
immediately think of a good way to find it in the data directory.

Here's an example of a line that should fix the Turkish problem
(though I haven't tested that, I am not a Windows user):

Turkish_T*.1254=tr-TR.1254

I added some documentation and showed that example.

If you wanted to check it's working on a Unix system, you might try
some lines like *.UTF-8=does_not_exist or en_US.UTF-8=fr_FR.UTF-8 and
then somehow verify that it's using French.

I considered adding win32setlocale.c to the list of files to build for
the port libraries even on Unix, and then wrapping the contents in
#ifdef, but IIUC macOS squawks if you have an empty .c after
preprocessing, so I'd have to add a dummy symbol in there.  Or maybe
that'd be better than what I did here, namely including
win32setlocale.c in chklocale.c in this case.  Better ideas welcome.
Adding a meson/configure switch to enable it and make the whole .c
file optional seemed excessive.

Attachment

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Thanks, Thomas. Will wait for the build option to do further testing. 

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Tuesday, June 25, 2024 4:52 AM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele
<Shawn.Steele@microsoft.com>;Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>;
pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Jun 4, 2024 at 9:22 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Since no one has come forward to test the patch I wrote on Windows, I 
> think my next move will be to try to make a build option that can also 
> do locale name renaming on Unix, so that I have something that I could 
> test myself and push for the next release of PostgreSQL which will be 
> in October.

Here is a such a patch.  If you go into pg_config_manual.h and uncomment this line:

/* #define DEBUG_SETLOCALE_MAP */

... then Unix systems will also be able to rename locales passed to setlocale().  A map file can be provided either by
puttingits absolute path into the environment variable PG_SETLOCALE_MAP, or by installing it as
$PREFIX/share/postgresql/setlocale.map. I couldn't immediately think of a good way to find it in the data directory.
 

Here's an example of a line that should fix the Turkish problem (though I haven't tested that, I am not a Windows
user):

Turkish_T*.1254=tr-TR.1254

I added some documentation and showed that example.

If you wanted to check it's working on a Unix system, you might try some lines like *.UTF-8=does_not_exist or
en_US.UTF-8=fr_FR.UTF-8and then somehow verify that it's using French.
 

I considered adding win32setlocale.c to the list of files to build for the port libraries even on Unix, and then
wrappingthe contents in #ifdef, but IIUC macOS squawks if you have an empty .c after preprocessing, so I'd have to add
adummy symbol in there.  Or maybe that'd be better than what I did here, namely including win32setlocale.c in
chklocale.cin this case.  Better ideas welcome.
 
Adding a meson/configure switch to enable it and make the whole .c file optional seemed excessive.

Windows Application Issues | PostgreSQL | REF # 51961374

From
"Haifang Wang (Centific Technologies Inc)"
Date:

Hi Team,

 

I’m a program manager in the Windows App Assure ISV Outreach Team at Microsoft. We work with Microsoft’s test organization to notify developers when issues have been identified in their applications. We’re reaching out to notify you of a potential issue in one of your applications.

 

The issue details are below, please review it when you have a moment. Our goal is to work with you to address this issue and to understand what your expected timeline to address this issue might be. If you have any questions about the details below or have already addressed this issue in a forthcoming update, please let me know.

 

Account

PostgreSQL Global Development Group

Product

PostgreSQL

Reference #

51961374

Issue

Fail to work with Find and Replace options within the application with latest version of app


Environment: Desktop
OS: Windows 11
Bamboo Paper: 16.3-2

Repro Steps:

  1. Deploy OS.
  2. Perform WU, enable roles & features.
  3. Download the application: PostgreSQL from vendor link .
  4. Run the setup file.
  5. After the installation, Restart the machine.
  6. Launch the PostgreSQL from the start menu.
  7. Connect to the database.
  8. Go to Tool menu ->click on query tool option.
  9. Write the sql query.
  10. Click edit option ->Try to work with find and replace options.
  11. Observe.

 

Observations:

Fail to work with Find and Replace options within the application with latest version of app

Expected Results:

Should be able to work with Find and Replace options successfully without any issue.

Resource

For any questions on app development (or) submission on windows, contact Windows Dev Center. https://developer.microsoft.com/en-us/windows/support

 

Thanks!

Haifang

 

 

 

Re: Windows Application Issues | PostgreSQL | REF # 51961374

From
Thomas Munro
Date:
On Sat, Jul 6, 2024 at 10:00 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Launch the PostgreSQL from the start menu.
> Connect to the database.
> Go to Tool menu ->click on query tool option.
> Write the sql query.
> Click edit option ->Try to work with find and replace options.

Hi Haifang,

It sounds like this might be referring to "pgadmin", a GUI tool for
talking to PostgreSQL?  PostgreSQL itself doesn't have a GUI, but the
associated pgadmin project does, and it has its own place for bug
reporting over here:

https://www.pgadmin.org/support/issues/



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 51961374

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Thanks a lot for pointing this out, Thomas. I will report the issue to pgadmin.

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Sunday, July 7, 2024 12:17 AM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org
Subject: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 51961374

On Sat, Jul 6, 2024 at 10:00 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Launch the PostgreSQL from the start menu.
> Connect to the database.
> Go to Tool menu ->click on query tool option.
> Write the sql query.
> Click edit option ->Try to work with find and replace options.

Hi Haifang,

It sounds like this might be referring to "pgadmin", a GUI tool for talking to PostgreSQL?  PostgreSQL itself doesn't
havea GUI, but the associated pgadmin project does, and it has its own place for bug reporting over here:
 

https://www.pgadmin.org/support/issues/

A minor release deadline is only a couple of weeks away and
unfortunately we still have no confirmation from a Windows user that
this technique can be used to unbreak a database cluster that actually
goes though that exact upgrade sequence.  So I spent today figuring
out how to write a TAP test to try to convince myself of that in the
abstract.  No C code change from v2.  Does the test show a useful
enough pathway that is likely to help people in future cases at least?

I learned two new things:

1.  With a remapped locale, you can log into your existing databases
again, but CREATE DATABASE doesn't work (see test).  That must have
been the case for the historical transitions that were hard-coded
(Macau etc); this is no different, it's just that the mapping table is
now user-changeable.  In other words, this just gives an administrator
a way to log into a cluster again after such a change, but then they
really need to update the names in catalogs, at least for any database
that might be used as a template, to get a 100% functional system.  So
I have added a paragraph to the documentation piece to explain that.

2.  Including the encoding suffix eg "en-US.1252" doesn't actually
work after all (discovered by CI).  You have to use just "en-US".  (I
guess ".UTF-8" must be a special case.)  So I have updated the
documentation example.

Attachment
On 2024-07-21 Su 7:54 AM, Thomas Munro wrote:
> A minor release deadline is only a couple of weeks away and
> unfortunately we still have no confirmation from a Windows user that
> this technique can be used to unbreak a database cluster that actually
> goes though that exact upgrade sequence.  So I spent today figuring
> out how to write a TAP test to try to convince myself of that in the
> abstract.  No C code change from v2.  Does the test show a useful
> enough pathway that is likely to help people in future cases at least?
>
> I learned two new things:
>
> 1.  With a remapped locale, you can log into your existing databases
> again, but CREATE DATABASE doesn't work (see test).  That must have
> been the case for the historical transitions that were hard-coded
> (Macau etc); this is no different, it's just that the mapping table is
> now user-changeable.  In other words, this just gives an administrator
> a way to log into a cluster again after such a change, but then they
> really need to update the names in catalogs, at least for any database
> that might be used as a template, to get a 100% functional system.  So
> I have added a paragraph to the documentation piece to explain that.
>
> 2.  Including the encoding suffix eg "en-US.1252" doesn't actually
> work after all (discovered by CI).  You have to use just "en-US".  (I
> guess ".UTF-8" must be a special case.)  So I have updated the
> documentation example.


I don't have a general comment, at least yet, but a couple, mostly 
stylistic, on your TAP test.


change

     if ($^O eq 'MSWin32')

to

     if ($windows_os)


Also your use of "is()" is unidiomatic. The expected value should come 
after the received value, not before. While your use won't change the 
success or failure of the result, the message on failure will be at best 
confusing. (I ran into this the other day, so it's fresh in my memory). 
See Test::More documentation.

Using all upper case for variables is not our usual perl style (or 
anybody else's that I am aware of). Mostly we just use lower case with 
underscores. If you use the "constant" module upper case letters are 
usually used, but then the things defined are bare words, and come with 
some caveats, and I don't think you want to use it here.

The substance of the test looks good.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




On Mon, Jul 22, 2024 at 1:28 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> change
>
>      if ($^O eq 'MSWin32')
>
> to
>
>      if ($windows_os)

Done.

> Also your use of "is()" is unidiomatic. The expected value should come
> after the received value, not before. While your use won't change the
> success or failure of the result, the message on failure will be at best
> confusing. (I ran into this the other day, so it's fresh in my memory).
> See Test::More documentation.

Fixed.

> Using all upper case for variables is not our usual perl style (or
> anybody else's that I am aware of). Mostly we just use lower case with
> underscores. If you use the "constant" module upper case letters are
> usually used, but then the things defined are bare words, and come with
> some caveats, and I don't think you want to use it here.

Fixed.

> The substance of the test looks good.

Thanks for looking!

One new-to-me fact from the -general mailing list this morning[1] is
that the EDB installer is explicitly feeding these unstable locale
names into the world's database clusters, so that's something that's
beyond the power of us upstream PostgreSQL hackers to fix even if we
pulled the trigger on the initdb BCP47-by-default patch[2].  Something
for someone at EDB to look into?

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKGJA4pGEeZtfUUEuQmNNO7x16wZ%2BRWOeXP%2Btq3Hf4iu8oQ%40mail.gmail.com#6426ff9fd44fbcd9a7f343a3f94ae364
[2] https://www.postgresql.org/message-id/flat/CA+hUKGJ=XThErgAQRoqfCy1bKPxXVuF0=2zDbB+SxDs59pv7Fw@mail.gmail.com

Attachment


On 2024-07-21 Su 6:00 PM, Thomas Munro wrote:
One new-to-me fact from the -general mailing list this morning[1] is
that the EDB installer is explicitly feeding these unstable locale
names into the world's database clusters, so that's something that's
beyond the power of us upstream PostgreSQL hackers to fix even if we
pulled the trigger on the initdb BCP47-by-default patch[2].  Something
for someone at EDB to look into?

[1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJA4pGEeZtfUUEuQmNNO7x16wZ%2BRWOeXP%2Btq3Hf4iu8oQ%40mail.gmail.com#6426ff9fd44fbcd9a7f343a3f94ae364
[2] https://www.postgresql.org/message-id/flat/CA+hUKGJ=XThErgAQRoqfCy1bKPxXVuF0=2zDbB+SxDs59pv7Fw@mail.gmail.com


Sandeep, can you comment?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Hi

On Mon, Jul 22, 2024 at 4:54 PM Andrew Dunstan <andrew@dunslane.net> wrote:


On 2024-07-21 Su 6:00 PM, Thomas Munro wrote:
One new-to-me fact from the -general mailing list this morning[1] is
that the EDB installer is explicitly feeding these unstable locale
names into the world's database clusters, so that's something that's
beyond the power of us upstream PostgreSQL hackers to fix even if we
pulled the trigger on the initdb BCP47-by-default patch[2].  Something
for someone at EDB to look into?

[1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJA4pGEeZtfUUEuQmNNO7x16wZ%2BRWOeXP%2Btq3Hf4iu8oQ%40mail.gmail.com#6426ff9fd44fbcd9a7f343a3f94ae364
[2] https://www.postgresql.org/message-id/flat/CA+hUKGJ=XThErgAQRoqfCy1bKPxXVuF0=2zDbB+SxDs59pv7Fw@mail.gmail.com


Sandeep, can you comment?


When I checked the installation log shared by Ertan at https://github.com/EnterpriseDB/edb-installers/issues/127, I see installer is using the correct locale name while doing initdb.

Executing C:\Windows\System32\cscript //NoLogo "C:\Program Files\PostgreSQL\16/installer/server/initcluster.vbs" "NT AUTHORITY\NetworkService" "postgres" "****" "C:\Users\User1\AppData\Local\Temp/postgresql_installer_cd79fad8b7" "C:\Program Files\PostgreSQL\16" "C:\DATA_PG16" 5432 "Turkish,Türkiye" 0

Thomas, I can generate the test installer with your patch and share here so this issue can be tested before the minor release?

cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com


--
Sandeep Thakkar


On Tue, Jul 23, 2024 at 12:12 AM Sandeep Thakkar
<sandeep.thakkar@enterprisedb.com> wrote:
> Thomas, I can generate the test installer with your patch and share here so this issue can be tested before the minor
release?

That might be useful, thanks!  If someone wants to try it out on a
real Windows system.

. o O { I wonder if it could ever be possible to have that automated
via CI.  Imagine if I/anyone could clone the EDB installer github
repo, and push a small patch into a private branch that tells it where
to pull PostgreSQL sources from, including any public git branch
including commitfest ones, and some options like with/without
assertions, and have CI spit out a new installer as an artifact?
Hopefully non-REL_XXX_STABLE branches would have a big warning on the
GUI so that users would never accidentally think it's a real release.
}





On Thu, Jul 25, 2024 at 11:39 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Tue, Jul 23, 2024 at 12:12 AM Sandeep Thakkar
<sandeep.thakkar@enterprisedb.com> wrote:
> Thomas, I can generate the test installer with your patch and share here so this issue can be tested before the minor release?

That might be useful, thanks!  If someone wants to try it out on a
real Windows system.

Sure, I can give it a try. Is your patch ready? 

. o O { I wonder if it could ever be possible to have that automated
via CI.  Imagine if I/anyone could clone the EDB installer github
repo, and push a small patch into a private branch that tells it where
to pull PostgreSQL sources from, including any public git branch
including commitfest ones, and some options like with/without
assertions, and have CI spit out a new installer as an artifact?
Hopefully non-REL_XXX_STABLE branches would have a big warning on the
GUI so that users would never accidentally think it's a real release.
}

yeah, that'll be helpful but unfortunately it's not possible atm. Hopefully, we will
have that available in future through the GitHub actions.


--
Sandeep Thakkar


I am going to withdraw this proposal.  In another email thread, we
learned two very important things:

1.  The problem of a locale names changing for an *existing* database
can apparently be fixed using "Locale Builder"[1].  Users in Turkey
have presumably already addressed the situation that way and there are
reports of articles about that in Turkish.  We don't really need the
"setlocale.map" rename-on-the-fly patch in this thread, and I don't
want to maintain it if we don't need it.

2.  The Windows runtime library is crashing when initdb's subprocess
postgres.exe calls it with a name encoded in an encoding it doesn't
like[2].  (IMHO it should return NULL instead!  The C standard does
not have a demons-may-fly-out-of-your-nose clause.)  We need to tell
users not to use those locale names.  That string is actually coming
from the EDB installer, and we'll follow up on that.  Perhaps
PostgreSQL should also consider stripping non-ASCII characters to
force it to fail nicely.

[1] https://www.postgresql.org/message-id/CA%2BhUKGLyfou2xa2fxGOs0WwkOrcKjwqn-bRNR01Hq5RAD16ueg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CA%2BhUKG%2BFxeRLURZ%3Dn8NPyLwgjFds_SqU_cQvE40ks6RQKUGbGg%40mail.gmail.com



On Wed, Jul 31, 2024 at 9:53 PM Sandeep Thakkar
<sandeep.thakkar@enterprisedb.com> wrote:
> On Thu, Jul 25, 2024 at 11:39 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> On Tue, Jul 23, 2024 at 12:12 AM Sandeep Thakkar
>> <sandeep.thakkar@enterprisedb.com> wrote:
>> . o O { I wonder if it could ever be possible to have that automated
>> via CI.  Imagine if I/anyone could clone the EDB installer github
>> repo, and push a small patch into a private branch that tells it where
>> to pull PostgreSQL sources from, including any public git branch
>> including commitfest ones, and some options like with/without
>> assertions, and have CI spit out a new installer as an artifact?
>> Hopefully non-REL_XXX_STABLE branches would have a big warning on the
>> GUI so that users would never accidentally think it's a real release.
>> }
>
> yeah, that'll be helpful but unfortunately it's not possible atm. Hopefully, we will
> have that available in future through the GitHub actions.

Sounds great.

FWIW it's easy[1][2] to make the PostgreSQL CI spit out a zip file
with a snapshot build of PostgreSQL.  I expect it doesn't actually run
on a real Windows system unless you also install DLLs it needs like
OpenSSL, ICU etc, and I'm not sure if that would be best done by
stuffing them too into the .zip file, or if the scripts[3] that the CI
images are prepared with should be used by someone who wants to test
it.  But thinking about that leads me back to the thought that it'd be
better if people interested in trying pre-releases could get an EDB
installer for a snapshot build from any branch anywhere, so it all
works the way they will actually use it in real life.  And of course
you guys already did all the hard work to deal with dependencies.

[1] https://github.com/macdice/postgres/commit/56e967f9cd35ef418a653d191a4d9ea85055d94b
[2] https://cirrus-ci.com/task/4831619775725568
[3] https://github.com/anarazel/pg-vm-images/tree/main/scripts



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Hi Thomas and Andrew,

Just would like to follow up with you about this bug. How is it going with the investigation?

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Wednesday, August 7, 2024 7:35 PM
To: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>
Cc: Andrew Dunstan <andrew@dunslane.net>; Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Rahul
Pandey<pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele
<Shawn.Steele@microsoft.com>;Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>;
pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com> 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

I am going to withdraw this proposal.  In another email thread, we learned two very important things:

1.  The problem of a locale names changing for an *existing* database can apparently be fixed using "Locale
Builder"[1]. Users in Turkey have presumably already addressed the situation that way and there are reports of articles
aboutthat in Turkish.  We don't really need the "setlocale.map" rename-on-the-fly patch in this thread, and I don't
wantto maintain it if we don't need it. 

2.  The Windows runtime library is crashing when initdb's subprocess postgres.exe calls it with a name encoded in an
encodingit doesn't like[2].  (IMHO it should return NULL instead!  The C standard does not have a
demons-may-fly-out-of-your-noseclause.)  We need to tell users not to use those locale names.  That string is actually
comingfrom the EDB installer, and we'll follow up on that.  Perhaps PostgreSQL should also consider stripping non-ASCII
charactersto force it to fail nicely. 

[1] https://www.postgresql.org/message-id/CA%2BhUKGLyfou2xa2fxGOs0WwkOrcKjwqn-bRNR01Hq5RAD16ueg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CA%2BhUKG%2BFxeRLURZ%3Dn8NPyLwgjFds_SqU_cQvE40ks6RQKUGbGg%40mail.gmail.com



+Peter Constable, FYI

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Wednesday, August 28, 2024 3:16 AM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>; Andrew Dunstan <andrew@dunslane.net>; Rahul Pandey
<pandeyrah@microsoft.com>;Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy
Wishnousky<amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati
<gulatishweta@microsoft.com>;Ashish Nawal <nawalashish@microsoft.com>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Aug 27, 2024 at 9:29 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. How is it going with the investigation?

Well, we figured out that Windows is aborting inside setlocale().  The C standard says "The pointer to string returned
bythe setlocale function is such that a subsequent call with that string value and its associated category will restore
thatpart of the program’s locale."
 
Window does not conform, and crashes itself.

One practical change we could make would be to reject non-ASCII locale names in initdb, as in the attached, and
back-patchto 16.  Then the EDB installer (a separate project that is calling initdb and providing the non-ASCII string
containing"ü") would at least get a graceful failure instead of a crash.  It hardly makes a difference to the user, but
I'vefinally understood that these crashes in the Windows runtime library might be generating a stream of crash reports
toMicrosoft, which might be bringing you here?
 

For example:

$ initdb -D asdf --locale="liberté"
The files belonging to this database system will be owned by user "tmunro".
This user must also own the server process.

initdb: error: locale name "liberté" contains non-ASCII characters

On 2024-08-28 We 6:16 AM, Thomas Munro wrote:
> On Tue, Aug 27, 2024 at 9:29 AM Haifang Wang (Centific Technologies
> Inc) <v-haiwang@microsoft.com> wrote:
>> Just would like to follow up with you about this bug. How is it going with the investigation?
> Well, we figured out that Windows is aborting inside setlocale().  The
> C standard says "The pointer to string returned by the setlocale
> function is such that a subsequent call with that string value and its
> associated category will restore that part of the program’s locale."
> Window does not conform, and crashes itself.
>
> One practical change we could make would be to reject non-ASCII locale
> names in initdb, as in the attached, and back-patch to 16.  Then the
> EDB installer (a separate project that is calling initdb and providing
> the non-ASCII string containing "ü") would at least get a graceful
> failure instead of a crash.  It hardly makes a difference to the user,
> but I've finally understood that these crashes in the Windows runtime
> library might be generating a stream of crash reports to Microsoft,
> which might be bringing you here?
>
> For example:
>
> $ initdb -D asdf --locale="liberté"
> The files belonging to this database system will be owned by user "tmunro".
> This user must also own the server process.
>
> initdb: error: locale name "liberté" contains non-ASCII characters


+1 for doing this.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Thanks for the reply, Andrew and Thomas. The issue is originally from user feedback. And our team verified and
reproducedit too. And it is high priority from our end is because: 1. It is regression, it did not see it in the
previousbuild. 2. After the crash happened, even the app could be installed, but cluster initialization failed. And due
towhich we cannot connect to the database and proceed further.
 

@Shawn @Rahul, do you have any comment or suggestions?

Regards!
Haifang

-----Original Message-----
From: Andrew Dunstan <andrew@dunslane.net>
Sent: Thursday, August 29, 2024 8:17 AM
To: Thomas Munro <thomas.munro@gmail.com>; Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>; Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak
<Vishwa.Deepak@microsoft.com>;Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane
<tgl@sss.pgh.pa.us>;pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal
<nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

[You don't often get email from andrew@dunslane.net. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification]
 

On 2024-08-28 We 6:16 AM, Thomas Munro wrote:
> On Tue, Aug 27, 2024 at 9:29 AM Haifang Wang (Centific Technologies
> Inc) <v-haiwang@microsoft.com> wrote:
>> Just would like to follow up with you about this bug. How is it going with the investigation?
> Well, we figured out that Windows is aborting inside setlocale().  The
> C standard says "The pointer to string returned by the setlocale
> function is such that a subsequent call with that string value and its
> associated category will restore that part of the program’s locale."
> Window does not conform, and crashes itself.
>
> One practical change we could make would be to reject non-ASCII locale
> names in initdb, as in the attached, and back-patch to 16.  Then the
> EDB installer (a separate project that is calling initdb and providing
> the non-ASCII string containing "ü") would at least get a graceful
> failure instead of a crash.  It hardly makes a difference to the user,
> but I've finally understood that these crashes in the Windows runtime
> library might be generating a stream of crash reports to Microsoft,
> which might be bringing you here?
>
> For example:
>
> $ initdb -D asdf --locale="liberté"
> The files belonging to this database system will be owned by user "tmunro".
> This user must also own the server process.
>
> initdb: error: locale name "liberté" contains non-ASCII characters


+1 for doing this.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com/


On 2024-08-29 Th 1:53 PM, Haifang Wang (Centific Technologies Inc) wrote:
> Thanks for the reply, Andrew and Thomas. The issue is originally from user feedback. And our team verified and
reproducedit too. And it is high priority from our end is because: 1. It is regression, it did not see it in the
previousbuild. 2. After the crash happened, even the app could be installed, but cluster initialization failed. And due
towhich we cannot connect to the database and proceed further.
 


Please don't top-post on PostgreSQL lists.


If I read Thomas correctly, the solution is to create a locale with an 
acceptable (pure Ascii) name using Locale Builder, and then call initdb 
using that name. It might be that you need to do some of this manually 
if the installer fails. IIRC is it possible to run the installer without 
having it create a cluster automatically. You can then set up the 
cluster and create a windows service manually using initdb / pg_ctl.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com




On Fri, Aug 30, 2024 at 3:16 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> On 2024-08-28 We 6:16 AM, Thomas Munro wrote:
> > initdb: error: locale name "liberté" contains non-ASCII characters
>
> +1 for doing this.

Thanks for looking!  OK, let's go with this approach.  That two-line
demo patch doesn't work properly for the system default, and of course
it should be in sync with the backend function, and needs comments to
explain this mess, so I'll work on a serious patch.



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Hi Thomas,

How is it going after all the analysis below? 

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Thursday, August 29, 2024 8:22 PM
To: Andrew Dunstan <andrew@dunslane.net>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Sandeep Thakkar
<sandeep.thakkar@enterprisedb.com>;Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>;
ShawnSteele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>;
pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Fri, Aug 30, 2024 at 10:55 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Fri, Aug 30, 2024 at 3:16 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> > On 2024-08-28 We 6:16 AM, Thomas Munro wrote:
> > > initdb: error: locale name "liberté" contains non-ASCII characters
> >
> > +1 for doing this.
>
> Thanks for looking!  OK, let's go with this approach.  That two-line 
> demo patch doesn't work properly for the system default, and of course 
> it should be in sync with the backend function, and needs comments to 
> explain this mess, so I'll work on a serious patch.

Done.  To develop this patch, I set up a cursed locale name on my FreeBSD box like so:

$ sudo cp -r /usr/share/locale/fr_FR.UTF-8 /usr/share/locale/fraternité.UTF-8

Then I could exercise the setlocale("") path, approximating what I expect a Turkish Windows system to do without
explicit--locale:
 

$ LC_TIME="fraternité.UTF-8" initdb -D pdata ...
initdb: error: locale name "fraternité.UTF-8" contains non-ASCII characters

The explicit path is easier to test, like before:

$ initdb -D pgdata --locale "égalité.UTF-9"
...
initdb: error: locale name "égalité.UTF-9" contains non-ASCII characters

That's what the EDB installer (if not changed to provide ASCII-clean names somehow or other) will start failing with,
ifit tries to feed umlauts to initdb.
 

Working through that, I understood another problem: even if you provide an acceptable string like --locale "tr-TR", we
stillneed to be able to save-and-restore the value that initdb inherited from the environment.  Using my simulation,
whatI mean is if you do:
 

$ export LANG="liberté.UTF-8"

... then ...

$ initdb -D pgdata --locale "fr_FR.UTF-8"

... it still needs to be able to do a safe save-and-restore back to the environment's value without crashing, even
thoughit's not really using that value.  I wondered if initdb really needs a locale other than "C" for itself, but I'm
tryingto minimise the changes here so I
 
don't break something else...   That seems to mean that we have to use
_wsetlocale() for that particular save/restore.  This is essentially what Haifang's colleagues were suggesting
upthread. Done.
 

Once a server is up and running, we shouldn't be exposed to the server environment (right?), but there are several more
placesthat accept user-supplied strings, and only CREATE COLLATION fails to run
 
check_locale():

postgres=# create collation foo (locale = 'fraternité.UTF-8'); CREATE COLLATION

The other places I am aware of do, so putting the equivalent code into
check_locale() does the right thing:

postgres=# create database db locale = 'fraternité.UTF-8';
WARNING:  locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR:  invalid LC_COLLATE locale name: "fraternité.UTF-8"
HINT:  If the locale name is specific to ICU, use ICU_LOCALE.

postgres=# set lc_time = 'fraternité.UTF-8';
WARNING:  locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR:  invalid value for parameter "lc_time": "fraternité.UTF-8"

postgres=# set lc_messages = 'fraternité.UTF-8';
WARNING:  locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR:  invalid value for parameter "lc_messages": "fraternité.UTF-8"

postgres=# set lc_monetary = 'fraternité.UTF-8';
WARNING:  locale name "fraternité.UTF-8" contains non-ASCII characters
ERROR:  invalid value for parameter "lc_monetary": "fraternité.UTF-8"

Those last SET lc_* changes aren't strictly necessary, because we never restore to those, we only switch to them and
thenrestore to the database default, but since they share code with CREATE DATABASE the new check kicks in anyway.
 

I tried adding check_locale() checks into CREATE COLLATION, but that made some tests fail (huh, I didn't know that the
setof strings acceptable to _newlocale() and setlocale() are different -- the former seems to accept "POSIX" on that
OS,which is exercised by some tests, or maybe it's some path of ours, I didn't check).  But, on reflection, I don't
believethat COLLATION objects actually have this problem anyway, because we never call setlocale() to them, we just use
themodern thread-safe POSIX-2008-oid APIs.  So I took that back out to minimise the patch.
 

Please see attached, including a restatement of the problem and solution in the commit message.

I suppose we could add some tests for this, but I was a bit unsure about the rules for putting non-ASCII into perl
scriptsand command lines, and would need to research that...
 

(Note: a lot of this old stuff is getting ripped out in master by the ongoing multi-threading effort...)

On Thu, Sep 5, 2024 at 9:28 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> How is it going after all the analysis below?

I proposed a patch.  I am waiting for someone to try it out, review it
and write the magic words "looks good to me", before I commit it and
back-patch it.  PostgreSQL will next make a release in early November.

Shortly after that, the EDB Windows installer team would begin
publishing a new Windows installer package, and then you should stop
hearing from Windows users whose runtime library exploded.  Hopefully
they will also adjust their GUI that offers a list of locale names to
users, so that it doesn't even offer the ones that don't work, and
does offer some better ones, like (probably) "tr-TR.UTF-8".



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Cool. is anyone going to try it out? Looking forward to the back-patch and release.

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Wednesday, September 4, 2024 2:41 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Andrew Dunstan <andrew@dunslane.net>; Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>; Rahul Pandey
<pandeyrah@microsoft.com>;Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy
Wishnousky<amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati
<gulatishweta@microsoft.com>;Ashish Nawal <nawalashish@microsoft.com>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Thu, Sep 5, 2024 at 9:28 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> How is it going after all the analysis below?

I proposed a patch.  I am waiting for someone to try it out, review it and write the magic words "looks good to me",
beforeI commit it and back-patch it.  PostgreSQL will next make a release in early November.
 

Shortly after that, the EDB Windows installer team would begin publishing a new Windows installer package, and then you
shouldstop hearing from Windows users whose runtime library exploded.  Hopefully they will also adjust their GUI that
offersa list of locale names to users, so that it doesn't even offer the ones that don't work, and does offer some
betterones, like (probably) "tr-TR.UTF-8".
 

"Haifang Wang (Centific Technologies Inc)" <v-haiwang@microsoft.com> writes:
> Cool. is anyone going to try it out? Looking forward to the back-patch and release.

I thought the implication was pretty clear that YOU ought to try it out.

            regards, tom lane



On Thu, Sep 5, 2024 at 10:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Haifang Wang (Centific Technologies Inc)" <v-haiwang@microsoft.com> writes:
> > Cool. is anyone going to try it out? Looking forward to the back-patch and release.
>
> I thought the implication was pretty clear that YOU ought to try it out.

It would also be good to hear from Sandeep and crew, if possible,
since they will be dealing with the consequences of this change.

But here is one thing Haifang might like to consider:  the
documentation[1] for setlocale() could really use a note in the
remarks section to highlight this danger.  Nothing in there seems to
alert the programmer to the fact that save = setlocale(NULL),
setlocale("something else"), setlocale(save) might fail to restore the
previous locale, and might even abort.  Perhaps that's a weird, old,
non-thread-safe thing to be doing (and we're working on removing
that), but it is quite explicitly described in the standard.  C99
7.11.1.1 paragraph 8: "The pointer to string returned by the setlocale
function is such that a subsequent call with that string value and its
associated category will restore that part of the program’s locale".
Aborting also fails paragraph 6: "If the selection cannot be honored,
the setlocale function returns a null pointer and the program’s locale
is not changed" (perhaps this has to do with the crt report mode
stuff, I don't know as I'm not a Windows programmer, I'm just trying
to maintain a C program; or maybe it's actually a bug?).  Another
thing is that setlocale(NULL) continues to return the "old style"
names while everything else in the OS is using the modern BCP 47
system since Vista, and the locale documentation recommends that
applications prefer those[2].  It might be worth a note about that
too?  (I guess stability and backwards compatibility are behind that
choice, but ... it's not really stable anyway.)

[1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170
[2]
https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-170



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Okay, I will ask our tester to try the patch. But have you tried it from your end? @Thomas Munro @Tom Lane

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Wednesday, September 4, 2024 3:35 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Andrew Dunstan <andrew@dunslane.net>; Sandeep
Thakkar<sandeep.thakkar@enterprisedb.com>; Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak
<Vishwa.Deepak@microsoft.com>;Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>;
pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Thu, Sep 5, 2024 at 10:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Haifang Wang (Centific Technologies Inc)" <v-haiwang@microsoft.com> writes:
> > Cool. is anyone going to try it out? Looking forward to the back-patch and release.
>
> I thought the implication was pretty clear that YOU ought to try it out.

It would also be good to hear from Sandeep and crew, if possible, since they will be dealing with the consequences of
thischange.
 

But here is one thing Haifang might like to consider:  the documentation[1] for setlocale() could really use a note in
theremarks section to highlight this danger.  Nothing in there seems to alert the programmer to the fact that save =
setlocale(NULL),setlocale("something else"), setlocale(save) might fail to restore the previous locale, and might even
abort. Perhaps that's a weird, old, non-thread-safe thing to be doing (and we're working on removing that), but it is
quiteexplicitly described in the standard.  C99
 
7.11.1.1 paragraph 8: "The pointer to string returned by the setlocale function is such that a subsequent call with
thatstring value and its associated category will restore that part of the program’s locale".
 
Aborting also fails paragraph 6: "If the selection cannot be honored, the setlocale function returns a null pointer and
theprogram’s locale is not changed" (perhaps this has to do with the crt report mode stuff, I don't know as I'm not a
Windowsprogrammer, I'm just trying to maintain a C program; or maybe it's actually a bug?).  Another thing is that
setlocale(NULL)continues to return the "old style"
 
names while everything else in the OS is using the modern BCP 47 system since Vista, and the locale documentation
recommendsthat applications prefer those[2].  It might be worth a note about that too?  (I guess stability and
backwardscompatibility are behind that choice, but ... it's not really stable anyway.)
 

[1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170
[2]
https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-170

Windows Application Issues | PostgreSQL | REF # 48475607

From
"David G. Johnston"
Date:
On Wednesday, September 4, 2024, Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> wrote:
Okay, I will ask our tester to try the patch. But have you tried it from your end? @Thomas Munro @Tom Lane

Thomas wrote that he authored it so you can safely assume it was tested by him.  Tom Lane is not on the hook to be a reviewer here - especially given his total involvement to date was only a single reply a while ago.  Tom as a committer could pick up the patch and commit it as a neutral party committer once the others involved on this thread provide some reviews; though given this is Windows-based I wouldn’t expect Tom to do that here.

David J.
On Thu, Sep 5, 2024 at 11:24 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Okay, I will ask our tester to try the patch. But have you tried it from your end? @Thomas Munro @Tom Lane

I demonstrated it working on my computer in the email:

https://www.postgresql.org/message-id/CA%2BhUKG%2B28zRyk8U%3DeFLn8O%2B17VrbpKykWPVrSzf57Ru32dQo8A%40mail.gmail.com

But I can't test on Windows.  For context, I have a "UNIX -- Live Free
or Die" license plate attached to the side of my workstation.



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Good to know that, Thomas. I'm asking our tester to try it. Will let you know.

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Wednesday, September 4, 2024 4:39 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Andrew Dunstan <andrew@dunslane.net>; Sandeep Thakkar
<sandeep.thakkar@enterprisedb.com>;Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>;
ShawnSteele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; pgsql-bugs@lists.postgresql.org; Shweta
Gulati<gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Thu, Sep 5, 2024 at 11:24 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Okay, I will ask our tester to try the patch. But have you tried it
> from your end? @Thomas Munro @Tom Lane

I demonstrated it working on my computer in the email:

https://www.postgresql.org/message-id/CA%2BhUKG%2B28zRyk8U%3DeFLn8O%2B17VrbpKykWPVrSzf57Ru32dQo8A%40mail.gmail.com

But I can't test on Windows.  For context, I have a "UNIX -- Live Free or Die" license plate attached to the side of my
workstation.

On Thu, Sep 5, 2024 at 12:07 PM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Good to know that, Thomas. I'm asking our tester to try it. Will let you know.

To give a more serious response, I don't really expect anything
"simple" about this patch not to work on Windows, as I know that it it
rejects non-ASCII names as I showed (on all OSes), and I know that
compiles and passes our tests on all our 4 CI operating systems:

https://cirrus-ci.com/build/4941642090676224

Really what I'm looking for is (1) feedback on the approach, code and
comments, and thoughts about more complex scenarios I may have failed
to think about, including say, pg_dump, pg_upgrade etc operational
issues, which probably involves lots of previous experience with
PostgreSQL, (2) opinions on whether we should add a test for these
cases and how to put the UTF-8 into a script (I'm confused about the
encoding of command line arguments), and (3) a nod from the EDB people
involved in distributing this software on Windows.



Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
Sandeep Thakkar
Date:


On Thu, Sep 5, 2024 at 5:46 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Thu, Sep 5, 2024 at 12:07 PM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Good to know that, Thomas. I'm asking our tester to try it. Will let you know.

To give a more serious response, I don't really expect anything
"simple" about this patch not to work on Windows, as I know that it it
rejects non-ASCII names as I showed (on all OSes), and I know that
compiles and passes our tests on all our 4 CI operating systems:

https://cirrus-ci.com/build/4941642090676224

Really what I'm looking for is (1) feedback on the approach, code and
comments, and thoughts about more complex scenarios I may have failed
to think about, including say, pg_dump, pg_upgrade etc operational
issues, which probably involves lots of previous experience with
PostgreSQL, (2) opinions on whether we should add a test for these
cases and how to put the UTF-8 into a script (I'm confused about the
encoding of command line arguments), and (3) a nod from the EDB people
involved in distributing this software on Windows.

We can help with producing the builds with the patches provided. You had also
mentioned about the changes required in the installer script, will it still be required? 


--
Sandeep Thakkar


RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:

@Sandeep Thakkar, I believe your question is to Thomas, is that right?

 

@Thomas Munro regarding the patch you shared previously, could you give some instruction on how to use it?

 

Thanks!
Haifang

 

From: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>
Sent: Thursday, September 5, 2024 4:47 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; Andrew Dunstan <andrew@dunslane.net>; Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

 

 

On Thu, Sep 5, 2024 at 5:46AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Sep 5, 2024 at 12:07PM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Good to know that, Thomas. I'm asking our tester to try it. Will let you know.

To give a more serious response, I don't really expect anything
"simple" about this patch not to work on Windows, as I know that it it
rejects non-ASCII names as I showed (on all OSes), and I know that
compiles and passes our tests on all our 4 CI operating systems:

https://cirrus-ci.com/build/4941642090676224

Really what I'm looking for is (1) feedback on the approach, code and
comments, and thoughts about more complex scenarios I may have failed
to think about, including say, pg_dump, pg_upgrade etc operational
issues, which probably involves lots of previous experience with
PostgreSQL, (2) opinions on whether we should add a test for these
cases and how to put the UTF-8 into a script (I'm confused about the
encoding of command line arguments), and (3) a nod from the EDB people
involved in distributing this software on Windows.


We can help with producing the builds with the patches provided. You had also
mentioned about the changes required in the installer script, will it still be required?
 


 

--

Sandeep Thakkar

 

On Thu, Sep 5, 2024 at 11:46 PM Sandeep Thakkar
<sandeep.thakkar@enterprisedb.com> wrote:
> On Thu, Sep 5, 2024 at 5:46 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Really what I'm looking for is (1) feedback on the approach, code and
>> comments, and thoughts about more complex scenarios I may have failed
>> to think about, including say, pg_dump, pg_upgrade etc operational
>> issues, which probably involves lots of previous experience with
>> PostgreSQL, (2) opinions on whether we should add a test for these
>> cases and how to put the UTF-8 into a script (I'm confused about the
>> encoding of command line arguments), and (3) a nod from the EDB people
>> involved in distributing this software on Windows.

If I don't hear any objections to this plan soon, I'm going to commit
this and back-patch it into PostgreSQL 16 and PostgreSQL 17 after the
upcoming code freeze for the PostgreSQL 17 release ends.  So it'll
probably be in 16.5 and 17.1.

> We can help with producing the builds with the patches provided. You had also
> mentioned about the changes required in the installer script, will it still be required?

If you don't change the installer script, then it will still fail if
someone selects "Türkiye" in your GUI, but now it will fail with an
ERROR rejecting non-ASCII characters, instead of crashing.  So people
in Türkiye, Côte d'Ivoire, Curaçao etc will still have no way to
initialise a cluster with your GUI in PostgreSQL 16.5 and 17.1 unless
they follow the instructions on the web to create a "Turkey" (or
whatever non-ASCII string they want).  Of course they could always use
initdb.exe directly from the command line with a BCP47 name.  Maybe
that's OK, but I think you should consider changing the installer.  A
conservative way to do it would be to show all the existing options
that you have now (so that someone who is happy using the old style
names when they don't contain non-ASCII can keep doing so), but also
have a second entry for each country that shows "Turkish, Türkiye
(tr-TR)" and/or perhaps "Turkish, Türkiye (tr-TR.UTF-8)" or perhaps
both, and passes just that part in parentheses to initdb, to give
users all the options.  Or perhaps you could have a checkbox "BCP 47
locales" that changes the list to show them.

No one has really reported any real world experience choosing between
the tr-TR vs tr-TR.UTF-8 alternatives, and you might like to
experiment with that.  The second option makes Windows' system
libraries use UTF-8 encoding instead of the traditional encoding
associated with the language.  As far as I can tell, it doesn't make
any difference at all to PostgreSQL yet, because your installer always
uses --encoding="UTF8" and, on Windows only, that makes PostgreSQL
ignore the locale's encoding and do a whole lot of internal
conversation to wchar_t because PostgreSQL doesn't yet know that
Windows 10+ can work with UTF-8 directly.

The reason that I am interested in this .UTF-8-or-not question is that
I'd like to consider *disallowing* non-matching encodings (see
commitfest entry #3772, reviewers wanted!), and teaching PostgreSQL
that Windows does in fact have UTF-8, just so we can delete a lot of
slow special case code, harmonise with Unix, and generally catch up
with reality.  So I figure we might as well start encouraging the
"xx-XX.UTF-8" names when using --encoding="UTF8" if we can't find any
downside, because under that plan it would eventually become illegal
to use --locale="tr-TR" (no .UTF-8) with --encoding="UTF-8" if that
eventually goes in, so it seems sensible to stop creating new clusters
that way ASAP so that users have a better time upgrading in the
future.  For example, a pg_upgrade from a PostgreSQL 17 cluster
initialised with --locale="tr-TR" --encoding="UTF8" to PostgreSQL 18
would proabbly require some extra step to rename "tr-TR" to
"tr-TR.UTF8" at some point (not sure exactly where), if PostgreSQL 18
starts rejecting the non-matching combination.  I don't know where
that'll go, though -- it's not high priority work, it's just
incremental cleanup and modernisation that practically suggests itself
whenever looking at rejigging locale code for thread-safety and
reading all those comments about wchar_t that are not true.



On Fri, Sep 20, 2024 at 11:37 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> If I don't hear any objections to this plan soon, I'm going to commit
> this and back-patch it into PostgreSQL 16 and PostgreSQL 17 after the
> upcoming code freeze for the PostgreSQL 17 release ends.  So it'll
> probably be in 16.5 and 17.1.

Done, after another round of polishing and testing.

The main change since v2 was: I realised that initdb's
locale_date_order() also contains a save-and-restore, which should
share code at least for consistency (I don't personally know which
categories really affect process-wide encoding assumptions, so this
might not technically be necessary).  I moved the relevant code into
new helper functions, which I think came out tidier.

The wchar_t-based save-restore in initdb.exe is based on a theoretical
observation, as the report was about postgres.exe crashing and that
should now be unreachable with these new name checks, but we can see
the general phenomenon from a two-line standalone repro[1] and so I
suspect there must be environment/input combinations that could crash
initdb.exe itself without it.  We don't have any concrete reports of
that, but I preferred to close the theoretical risk using the
recommended technique.

Sorry for taking so long to untangle this mess; I assumed for much of
the thread that we were talking about the failure of existing clusters
to come up (as complained about in numerous other threads), not this
new kind of crash, and was trying to fix that before the Locale
Builder solution was widely known.  Fixing Windows bugs as a
non-Windows guy by reading the manual and testing patches on CI can
sometimes feel a little like trying to patch a Mars rover...  Thanks
to David R for help with a crucial observation using a martian
debugger.

CREATE COLLATION should probably also start using check_locale()
eventually, and I may look into harmonising that and figure out how to
test it in nearby future work on locales vs threads.

[1] https://www.postgresql.org/message-id/CA+hUKG+FxeRLURZ=n8NPyLwgjFds_SqU_cQvE40ks6RQKUGbGg@mail.gmail.com



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Hi Thomas and team,

Just would like to follow up with you about this bug. Is the bug fix available? We noticed that there is a new build
17.0-1.However the issue repro on that build too.
 

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Friday, October 4, 2024 6:17 PM
To: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; Andrew Dunstan
<andrew@dunslane.net>;Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele
<Shawn.Steele@microsoft.com>;Amy Wishnousky <amyw@microsoft.com>; pgsql-bugs@lists.postgresql.org; Shweta Gulati
<gulatishweta@microsoft.com>;Ashish Nawal <nawalashish@microsoft.com>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Fri, Sep 20, 2024 at 11:37 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> If I don't hear any objections to this plan soon, I'm going to commit
> this and back-patch it into PostgreSQL 16 and PostgreSQL 17 after the
> upcoming code freeze for the PostgreSQL 17 release ends.  So it'll
> probably be in 16.5 and 17.1.

Done, after another round of polishing and testing.

The main change since v2 was: I realised that initdb's
locale_date_order() also contains a save-and-restore, which should share code at least for consistency (I don't
personallyknow which categories really affect process-wide encoding assumptions, so this might not technically be
necessary). I moved the relevant code into new helper functions, which I think came out tidier.
 

The wchar_t-based save-restore in initdb.exe is based on a theoretical observation, as the report was about
postgres.execrashing and that should now be unreachable with these new name checks, but we can see the general
phenomenonfrom a two-line standalone repro[1] and so I suspect there must be environment/input combinations that could
crashinitdb.exe itself without it.  We don't have any concrete reports of that, but I preferred to close the
theoreticalrisk using the recommended technique.
 

Sorry for taking so long to untangle this mess; I assumed for much of the thread that we were talking about the failure
ofexisting clusters to come up (as complained about in numerous other threads), not this new kind of crash, and was
tryingto fix that before the Locale Builder solution was widely known.  Fixing Windows bugs as a non-Windows guy by
readingthe manual and testing patches on CI can sometimes feel a little like trying to patch a Mars rover...  Thanks to
DavidR for help with a crucial observation using a martian debugger.
 

CREATE COLLATION should probably also start using check_locale() eventually, and I may look into harmonising that and
figureout how to test it in nearby future work on locales vs threads.
 

[1] https://www.postgresql.org/message-id/CA+hUKG+FxeRLURZ=n8NPyLwgjFds_SqU_cQvE40ks6RQKUGbGg@mail.gmail.com

On Tue, Nov 12, 2024 at 12:12 PM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Is the bug fix available? We noticed that there is a new build
17.0-1.However the issue repro on that build too. 

Hi Haifang,
The change is in v17.1 and v16.5, but the release process for those is
in progress right now, getting them out to the packaging teams.
Hopefully new Windows installer packages will appear in a few days or
so?



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

From
"Haifang Wang (Centific Technologies Inc)"
Date:
Great to hear that, Thomas. Looking forward to hearing from your once the new release is available.

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Monday, November 11, 2024 4:43 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Sandeep Thakkar <sandeep.thakkar@enterprisedb.com>; Tom Lane <tgl@sss.pgh.pa.us>; Andrew Dunstan
<andrew@dunslane.net>;Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele
<Shawn.Steele@microsoft.com>;Amy Wishnousky <amyw@microsoft.com>; pgsql-bugs@lists.postgresql.org; Shweta Gulati
<gulatishweta@microsoft.com>;Ashish Nawal <nawalashish@microsoft.com>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Nov 12, 2024 at 12:12 PM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Is the bug fix available? We noticed that there is a new build
17.0-1.However the issue repro on that build too.
 

Hi Haifang,
The change is in v17.1 and v16.5, but the release process for those is in progress right now, getting them out to the
packagingteams.
 
Hopefully new Windows installer packages will appear in a few days or so?