Thread: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows

Hello!

We have an environment with numerous client Windows machines. The machines have PSQL v9.6/v13.13/v15.6 installed.
We've been working on slowly upgrading the 9/13 machines to use PSQL 15, which has been going smoothly until recently.

We've been made aware of an issue with 15.6 which prompted us to change our latest version to 15.12 instead (we can't go past 15 due to many of our machines being Windows Server 2016, which from our understanding is not supported in v16 and onwards).

Upgrading our 15.6 machines to 15.12 has been going smoothly.
However, running PG_UPGRADE on PSQL v9.6/13.13 to v15.12 has been failing.

This is the error message:
lc_collate values for database "template1" do not match:  old "English_United States.1252", new "en-US"

Upon investigation, we came to understand that the PSQL v9.6/v13.13 config contains the following configuration by default:
lc_messages = 'English_United States.1252'
lc_monetary = 'English_United States.1252'
lc_numeric = 'English_United States.1252'
lc_time = 'English_United States.1252'

But when installing PSQL 15.12 it's changed to:
lc_messages = 'en-US'
lc_monetary = 'en-US'
lc_numeric = 'en-US'
lc_time = 'en-US'

Is a failure in upgrading to this version intended?
We can workaround this issue by upgrading to 15.6 and then to 15.12, but I'd like to avoid this solution as it lengthens potential downtime for our clients.

Any help would be appreciated.
Thanks!

--

photo

Ben Caspi
DevOps Engineer

www.aidoc.com  |  benc@aidoc.com

linkedin

twitter

App Banner Image

 
__tpx__

On Fri, Mar 21, 2025 at 4:43 PM Ben Caspi <benc@aidoc.com> wrote:
Hello!

We have an environment with numerous client Windows machines. The machines have PSQL v9.6/v13.13/v15.6 installed.
We've been working on slowly upgrading the 9/13 machines to use PSQL 15, which has been going smoothly until recently.

We've been made aware of an issue with 15.6 which prompted us to change our latest version to 15.12 instead (we can't go past 15 due to many of our machines being Windows Server 2016, which from our understanding is not supported in v16 and onwards).

Upgrading our 15.6 machines to 15.12 has been going smoothly.
However, running PG_UPGRADE on PSQL v9.6/13.13 to v15.12 has been failing.

This is the error message:
lc_collate values for database "template1" do not match:  old "English_United States.1252", new "en-US"

Upon investigation, we came to understand that the PSQL v9.6/v13.13 config contains the following configuration by default:
lc_messages = 'English_United States.1252'
lc_monetary = 'English_United States.1252'
lc_numeric = 'English_United States.1252'
lc_time = 'English_United States.1252'

But when installing PSQL 15.12 it's changed to:
lc_messages = 'en-US'
lc_monetary = 'en-US'
lc_numeric = 'en-US'
lc_time = 'en-US'

Is a failure in upgrading to this version intended?
We can workaround this issue by upgrading to 15.6 and then to 15.12, but I'd like to avoid this solution as it lengthens potential downtime for our clients.

Any help would be appreciated.
Thanks!

--

photo

Ben Caspi
DevOps Engineer

www.aidoc.com  |  benc@aidoc.com

linkedin

twitter

App Banner Image

 
__tpx__


Thanks for the report!

Copying Thomas Munro and Tom Lane as they were involved in the discussions
referenced in this email, and their inputs will be useful as well. Please copy other
contributors too, if needed.

This change in the behaviour of the installer is the result of this commit [1].

 As a part of this change, the installer would convert the chosen locale to its corresponding BCP-47
[2] code name before passing it on to initdb.exe. This was helpful for users where the locale name
contained non-ascii characters and initdb would fail. We received a significant number of tickets
from users after Microsoft made that change (to add non-ascii characters) in their updates.

Reading the thread [3], it seems it's probably not recommended to update the pg_database.datcollate
or datctype. I am thinking if it might help if installer converted the chosen locale name to BCP-47 only
when it contains non-ascii characters, otherwise, it should use the name as is during initdb run.
Will this help?

[1]https://github.com/EnterpriseDB/edb-installers/commit/e6404b5194051e20cfc0e7f268a69091e6445a73 [2]https://www.postgresql.org/message-id/CA%2BhUKGL5mBN3JQuebAPbX0yxDNtpui04J%2BKSy2F7KBbhLGaJig%40mail.gmail.com 

Thanks!
Manika Singhal
EDB India 

Hi Manika,

Thank you for the detailed explanation.

After reviewing the installer commit, I believe there might be a larger issue that might affect many users.
I noticed that this commit was merged into REL-13, REL-14, REL-15, and REL-16.

I anticipate that users could face an upgrade issue if they installed a cluster on a Windows machine using the default locale before this change and then attempt to upgrade to a release that includes this commit.

For example:

  1. Install a cluster on version v13.1 on a Windows machine with the en-US locale (default). The cluster locale will be 'English_United States.1252'.

  2. Install a cluster on version v15.12 on a Windows machine with the en-US locale (default). The cluster locale will be 'en-US'.

If you then try to migrate from the old cluster to the new one using the pg_upgrade command, you'll encounter the following error:

lc_collate values for database "template1" do not match: old "English_United States.1252", new "en-US"

I’m curious if other users have reported this issue, as it seems to be a critical one.

As for your suggestion, it will work for my case since my locale name doesn’t contain non-ASCII characters.

Thanks,

Avi





On Fri, Apr 4, 2025 at 3:30 PM Manika Singhal <manika.singhal@enterprisedb.com> wrote:

On Fri, Mar 21, 2025 at 4:43 PM Ben Caspi <benc@aidoc.com> wrote:
Hello!

We have an environment with numerous client Windows machines. The machines have PSQL v9.6/v13.13/v15.6 installed.
We've been working on slowly upgrading the 9/13 machines to use PSQL 15, which has been going smoothly until recently.

We've been made aware of an issue with 15.6 which prompted us to change our latest version to 15.12 instead (we can't go past 15 due to many of our machines being Windows Server 2016, which from our understanding is not supported in v16 and onwards).

Upgrading our 15.6 machines to 15.12 has been going smoothly.
However, running PG_UPGRADE on PSQL v9.6/13.13 to v15.12 has been failing.

This is the error message:
lc_collate values for database "template1" do not match:  old "English_United States.1252", new "en-US"

Upon investigation, we came to understand that the PSQL v9.6/v13.13 config contains the following configuration by default:
lc_messages = 'English_United States.1252'
lc_monetary = 'English_United States.1252'
lc_numeric = 'English_United States.1252'
lc_time = 'English_United States.1252'

But when installing PSQL 15.12 it's changed to:
lc_messages = 'en-US'
lc_monetary = 'en-US'
lc_numeric = 'en-US'
lc_time = 'en-US'

Is a failure in upgrading to this version intended?
We can workaround this issue by upgrading to 15.6 and then to 15.12, but I'd like to avoid this solution as it lengthens potential downtime for our clients.

Any help would be appreciated.
Thanks!

--

photo

Ben Caspi
DevOps Engineer

www.aidoc.com  |  benc@aidoc.com

linkedin

twitter

App Banner Image

 
__tpx__


Thanks for the report!

Copying Thomas Munro and Tom Lane as they were involved in the discussions
referenced in this email, and their inputs will be useful as well. Please copy other
contributors too, if needed.

This change in the behaviour of the installer is the result of this commit [1].

 As a part of this change, the installer would convert the chosen locale to its corresponding BCP-47
[2] code name before passing it on to initdb.exe. This was helpful for users where the locale name
contained non-ascii characters and initdb would fail. We received a significant number of tickets
from users after Microsoft made that change (to add non-ascii characters) in their updates.

Reading the thread [3], it seems it's probably not recommended to update the pg_database.datcollate
or datctype. I am thinking if it might help if installer converted the chosen locale name to BCP-47 only
when it contains non-ascii characters, otherwise, it should use the name as is during initdb run.
Will this help?

[1]https://github.com/EnterpriseDB/edb-installers/commit/e6404b5194051e20cfc0e7f268a69091e6445a73 [2]https://www.postgresql.org/message-id/CA%2BhUKGL5mBN3JQuebAPbX0yxDNtpui04J%2BKSy2F7KBbhLGaJig%40mail.gmail.com 

Thanks!
Manika Singhal
EDB India 
On Mon, 2025-04-07 at 17:59 +0300, Avi Uziel wrote:
> On Fri, Apr 4, 2025 at 3:30 PM Manika Singhal <manika.singhal@enterprisedb.com> wrote:
> > On Fri, Mar 21, 2025 at 4:43 PM Ben Caspi <benc@aidoc.com> wrote:
> > > We have an environment with numerous client Windows machines. The machines have PSQL v9.6/v13.13/v15.6 installed.
> > >
> > > However, running PG_UPGRADE on PSQL v9.6/13.13 to v15.12 has been failing.
> > >
> > > This is the error message:
> > > lc_collate values for database "template1" do not match:  old "English_United States.1252", new "en-US"
> > >
> > > Upon investigation, we came to understand that the PSQL v9.6/v13.13 config contains the following configuration
bydefault: 
> > > lc_messages = 'English_United States.1252'
> > > lc_monetary = 'English_United States.1252'
> > > lc_numeric = 'English_United States.1252'
> > > lc_time = 'English_United States.1252'
> > >
> > > But when installing PSQL 15.12 it's changed to:
> > > lc_messages = 'en-US'
> > > lc_monetary = 'en-US'
> > > lc_numeric = 'en-US'
> > > lc_time = 'en-US'
> >
> > This change in the behaviour of the installer is the result of this commit [1].
> >
> >  As a part of this change, the installer would convert the chosen locale to its corresponding BCP-47
> > [2] code name before passing it on to initdb.exe. This was helpful for users where the locale name
> > contained non-ascii characters and initdb would fail. We received a significant number of tickets
> > from users after Microsoft made that change (to add non-ascii characters) in their updates.
> >
> > Reading the thread [3], it seems it's probably not recommended to update the pg_database.datcollate
> > or datctype. I am thinking if it might help if installer converted the chosen locale name to BCP-47 only
> > when it contains non-ascii characters, otherwise, it should use the name as is during initdb run.
> > Will this help?
> >
> > [1]https://github.com/EnterpriseDB/edb-installers/commit/e6404b5194051e20cfc0e7f268a69091e6445a73
> > [2]https://www.postgresql.org/message-id/CA%2BhUKGL5mBN3JQuebAPbX0yxDNtpui04J%2BKSy2F7KBbhLGaJig%40mail.gmail.com
> > [3]https://www.postgresql.org/message-id/2694195.1700072765%40sss.pgh.pa.us
>
> I anticipate that users could face an upgrade issue if they installed a cluster on a Windows
> machine using the default locale before this change and then attempt to upgrade to a release
> that includes this commit.

Yes, I think this will cause more trouble.

I think that we should stick with BCP-47 locale names as much as possible.  The problem with
the long locale names is not only non-ASCII characters, but that Microsoft keeps changing these
names, and PostgreSQL persists them in the catalog, which causes trouble if Windows is upgraded.

I can see three potential ways to deal with that:

1. Only when creating a cluster for upgrade, use the locale names that the old cluster uses.

   That is difficult, because it requires to connect to that cluster, as the information is
   only in catalog tables.  Also, if there are several clusters, which one to use?

2. Before upgrading a cluster, update the catalog tables of the old cluster with the
   corresponding BCP-47 name.

   That would be a good way into a world with only BCP-47 locale names.

   On the one hand, it would be convenient to have pg_upgrade do this automatically, but if
   the upgrade fails for whatever reason, your original cluster got modified, which doesn't
   feel right.

   Perhaps the Windows binaries could come with an extra tool for such a change, and perhaps
   the installer could suggest running it before an upgrade.

3. A minimal solution would be to equip the Windows installer with some guidance for the user
   at the point where the locale is chosen.  The BCP-47 locale name would be suggested, and
   a text could point out "If you plan to upgrade a cluster that was created with long locale
   names, please select the same locale names for the new cluster".

I would like #2, but I cannot tell how easy or feasible it would be.

Yours,
Laurenz Albe



On Tue, Apr 8, 2025 at 5:50 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> On Mon, 2025-04-07 at 17:59 +0300, Avi Uziel wrote:
> > I anticipate that users could face an upgrade issue if they installed a cluster on a Windows
> > machine using the default locale before this change and then attempt to upgrade to a release
> > that includes this commit.
>
> Yes, I think this will cause more trouble.
>
> I think that we should stick with BCP-47 locale names as much as possible.  The problem with
> the long locale names is not only non-ASCII characters, but that Microsoft keeps changing these
> names, and PostgreSQL persists them in the catalog, which causes trouble if Windows is upgraded.

+100

I've been saying for a while[1] that initdb should default to BCP-47
too.  But it seems likely that the entire PostgreSQL-on-Windows
community uses the popular EDB Installer, not command line tools?  It
doesn't use initdb's default anyway, so changes to initdb were never
going to start the migration of the world's PostgreSQL-on-Windows
fleet to BCP-47.  It took President Erdoğan and the EDB installer team
to kick it off.  Despite this upgrade wrinkle, which I'm sure we can
smooth out, I'm really happy to see it finally happening!

Historically in our commit log, tree and mailing lists we've kvetched
about unstable locale names on that OS, but the
please-don't-store-the-old-style-names-anywhere-do-this-other-thing-instead
warning in the documentation presumably started with Vista a couple of
decades ago.  Unfortunately that shipped around the same time as the
PostgreSQL-on-Windows port, and of course it targeted older releases
for some time.  Better late than never.

(As an aside: I think our kvetching wasn't unwarranted, it just didn't
tackle the root problem.  Perhaps unsurprisingly, because they changed
that in all the Windows native APIs, but setlocale() continued and
continues to expose the pre-Vista names, creating an asymmetry and
incidentally also exposing non-C standard conforming behaviour due to
char/whcar_t schizophrenia.  I could guess that setlocale() and
friends only exist for minimal/partial conformance anyway, and some
kind of no-changes-there-unless-we-have-to policy was took priority?
After all, few are interested in non-MT-safe code on that platform.
That's all OK once you understand it and code around it, and it's a
solid clue that French_France probably accesses the same underlying
locale definition as fr-FR, but I've been hesitant to write any code
myself that relies on that without a written contract.  FWIW I think
our locale name situation is finally pretty close to where it should
be now on Windows, partially as a byproduct of our own renewed
interest in MT-safe code.  I also hope we can do native UTF-8 for v19
(perhaps creating a future upgrade hump for fr-FR -> fr-FR.UTF-8,
which it would have been nice to fold into this one and maybe we
should consider that some time soon).  And also tackle some related
encoding problems in other places, namely argv, environ, path names,
shared database objects and early-stage pq protocol.  Some of that
requires slightly different thinking on Windows due to its wchar_t
model.  I've written long emails about my learnings there eg some
library calls expect setlocale() encoded C strings, others the ACP,
some we flipped from one camp to the other by making our own
wrappers/emulations of libc calls, and a few corners can only be
sanely approached by resorting to wchar_t APIs.  Basically there was
zero hope of a non-Windows hacker pulling that sort of thing off
before we had good CI, and it helps a lot to view it as part of larger
project to nail down the semantics of multi-encoding clusters on all
OSes, with only a few special concerns for Windows... anyway, many POC
patches exist, hacking continues...)

> I can see three potential ways to deal with that:
>
> 1. Only when creating a cluster for upgrade, use the locale names that the old cluster uses.
>
>    That is difficult, because it requires to connect to that cluster, as the information is
>    only in catalog tables.  Also, if there are several clusters, which one to use?
>
> 2. Before upgrading a cluster, update the catalog tables of the old cluster with the
>    corresponding BCP-47 name.
>
>    That would be a good way into a world with only BCP-47 locale names.
>
>    On the one hand, it would be convenient to have pg_upgrade do this automatically, but if
>    the upgrade fails for whatever reason, your original cluster got modified, which doesn't
>    feel right.
>
>    Perhaps the Windows binaries could come with an extra tool for such a change, and perhaps
>    the installer could suggest running it before an upgrade.

I was hesitant to suggest this approach in the many threads about this
mess even though I can see the appeal of a silent upgrade path.  If
the EDB tooling or any potential core PostgreSQL tooling had a
built-in assumption that the pre-Vista locales are 100% compatible
with the Vista locales without any official documentation to point to,
that'd be based on a guess or unsubstantiated claim AFAIK and could
turn out wrong in some subtle way.  Even though you can sort of
deduce/guess that the old-style names are really just the display
names (I forget the term) that appear in the structs that GetLocales()
(or whatever it's called) spits out, I would prefer to leave
catalogue-clobbering shenanigans up to brave end users, even though
it'd probably work out OK in practice.  Call me a chicken, but if
existing clusters aren't broken, why fix them?  Any human or script
that does such things should probably also run REINDEX without proof
positive that it's not needed (amcheck, or documented evidence that
it's not necessary), or take that risk themselves, and technically it
is not only indexes that can depend on locale definitions (also CHECK,
and more things I've forgotten since the failed attempt to model those
dependencies a few releases back...).  It's a nice goal, but my gut
feeling was that it should not be tangled up with upgrades.

The only users who *can't* use the old style names without an extra
step are those actually affected by country renames like Türkiye.  The
path of least resistance for them seems to be to follow the advice on
creating a "Turkish_Turkey" locale using the Locale Builder tool, and
then just keep using that name across the upgrade.  Affected users
presumably already had to do that to keep their old cluster running
before they decided to upgrade anyway, so why force them to tackle a
transition to BCP-47 *during* an upgrade?

> 3. A minimal solution would be to equip the Windows installer with some guidance for the user
>    at the point where the locale is chosen.  The BCP-47 locale name would be suggested, and
>    a text could point out "If you plan to upgrade a cluster that was created with long locale
>    names, please select the same locale names for the new cluster".

+1

That's why I suggested[2] that the installer should at least still
*offer* the old names in the thread where we banned non-ASCII names
(which was not fun to back-patch, but initdb.exe was apparently
crashing all over Türkiye, to the extent that the Windows crash report
investigation team (I think that's who it was?) was compelled to reach
out to our list (at least that's what I guess happened, IDK...).  Does
it not?  It should definitely encourage the use of the modern names by
default for new installations, but perhaps it should have a checkbox
to reveal the old ones?  It could grey out or filter out the non-ASCII
ones that initdb would reject.  +1 for adding some clues about the
relevance to upgrades in the GUI too.

[1] https://commitfest.postgresql.org/patch/3772/
[2] https://www.postgresql.org/message-id/CA%2BhUKGLfrK33XpFXsRcc97a1Qa5Vz1YFEn4GC1vie7yse%3DffPA%40mail.gmail.com




On Thu, Apr 10, 2025 at 5:19 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Tue, Apr 8, 2025 at 5:50 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> On Mon, 2025-04-07 at 17:59 +0300, Avi Uziel wrote:
> > I anticipate that users could face an upgrade issue if they installed a cluster on a Windows
> > machine using the default locale before this change and then attempt to upgrade to a release
> > that includes this commit.
>
> Yes, I think this will cause more trouble.
>
> I think that we should stick with BCP-47 locale names as much as possible.  The problem with
> the long locale names is not only non-ASCII characters, but that Microsoft keeps changing these
> names, and PostgreSQL persists them in the catalog, which causes trouble if Windows is upgraded.

+100

I've been saying for a while[1] that initdb should default to BCP-47
too.  But it seems likely that the entire PostgreSQL-on-Windows
community uses the popular EDB Installer, not command line tools?  It
doesn't use initdb's default anyway, so changes to initdb were never
going to start the migration of the world's PostgreSQL-on-Windows
fleet to BCP-47.  It took President Erdoğan and the EDB installer team
to kick it off.  Despite this upgrade wrinkle, which I'm sure we can
smooth out, I'm really happy to see it finally happening!

Historically in our commit log, tree and mailing lists we've kvetched
about unstable locale names on that OS, but the
please-don't-store-the-old-style-names-anywhere-do-this-other-thing-instead
warning in the documentation presumably started with Vista a couple of
decades ago.  Unfortunately that shipped around the same time as the
PostgreSQL-on-Windows port, and of course it targeted older releases
for some time.  Better late than never.

(As an aside: I think our kvetching wasn't unwarranted, it just didn't
tackle the root problem.  Perhaps unsurprisingly, because they changed
that in all the Windows native APIs, but setlocale() continued and
continues to expose the pre-Vista names, creating an asymmetry and
incidentally also exposing non-C standard conforming behaviour due to
char/whcar_t schizophrenia.  I could guess that setlocale() and
friends only exist for minimal/partial conformance anyway, and some
kind of no-changes-there-unless-we-have-to policy was took priority?
After all, few are interested in non-MT-safe code on that platform.
That's all OK once you understand it and code around it, and it's a
solid clue that French_France probably accesses the same underlying
locale definition as fr-FR, but I've been hesitant to write any code
myself that relies on that without a written contract.  FWIW I think
our locale name situation is finally pretty close to where it should
be now on Windows, partially as a byproduct of our own renewed
interest in MT-safe code.  I also hope we can do native UTF-8 for v19
(perhaps creating a future upgrade hump for fr-FR -> fr-FR.UTF-8,
which it would have been nice to fold into this one and maybe we
should consider that some time soon).  And also tackle some related
encoding problems in other places, namely argv, environ, path names,
shared database objects and early-stage pq protocol.  Some of that
requires slightly different thinking on Windows due to its wchar_t
model.  I've written long emails about my learnings there eg some
library calls expect setlocale() encoded C strings, others the ACP,
some we flipped from one camp to the other by making our own
wrappers/emulations of libc calls, and a few corners can only be
sanely approached by resorting to wchar_t APIs.  Basically there was
zero hope of a non-Windows hacker pulling that sort of thing off
before we had good CI, and it helps a lot to view it as part of larger
project to nail down the semantics of multi-encoding clusters on all
OSes, with only a few special concerns for Windows... anyway, many POC
patches exist, hacking continues...)

> I can see three potential ways to deal with that:
>
> 1. Only when creating a cluster for upgrade, use the locale names that the old cluster uses.
>
>    That is difficult, because it requires to connect to that cluster, as the information is
>    only in catalog tables.  Also, if there are several clusters, which one to use?
>
> 2. Before upgrading a cluster, update the catalog tables of the old cluster with the
>    corresponding BCP-47 name.
>
>    That would be a good way into a world with only BCP-47 locale names.
>
>    On the one hand, it would be convenient to have pg_upgrade do this automatically, but if
>    the upgrade fails for whatever reason, your original cluster got modified, which doesn't
>    feel right.
>
>    Perhaps the Windows binaries could come with an extra tool for such a change, and perhaps
>    the installer could suggest running it before an upgrade.

I was hesitant to suggest this approach in the many threads about this
mess even though I can see the appeal of a silent upgrade path.  If
the EDB tooling or any potential core PostgreSQL tooling had a
built-in assumption that the pre-Vista locales are 100% compatible
with the Vista locales without any official documentation to point to,
that'd be based on a guess or unsubstantiated claim AFAIK and could
turn out wrong in some subtle way.  Even though you can sort of
deduce/guess that the old-style names are really just the display
names (I forget the term) that appear in the structs that GetLocales()
(or whatever it's called) spits out, I would prefer to leave
catalogue-clobbering shenanigans up to brave end users, even though
it'd probably work out OK in practice.  Call me a chicken, but if
existing clusters aren't broken, why fix them?  Any human or script
that does such things should probably also run REINDEX without proof
positive that it's not needed (amcheck, or documented evidence that
it's not necessary), or take that risk themselves, and technically it
is not only indexes that can depend on locale definitions (also CHECK,
and more things I've forgotten since the failed attempt to model those
dependencies a few releases back...).  It's a nice goal, but my gut
feeling was that it should not be tangled up with upgrades.

The only users who *can't* use the old style names without an extra
step are those actually affected by country renames like Türkiye.  The
path of least resistance for them seems to be to follow the advice on
creating a "Turkish_Turkey" locale using the Locale Builder tool, and
then just keep using that name across the upgrade.  Affected users
presumably already had to do that to keep their old cluster running
before they decided to upgrade anyway, so why force them to tackle a
transition to BCP-47 *during* an upgrade?

> 3. A minimal solution would be to equip the Windows installer with some guidance for the user
>    at the point where the locale is chosen.  The BCP-47 locale name would be suggested, and
>    a text could point out "If you plan to upgrade a cluster that was created with long locale
>    names, please select the same locale names for the new cluster".

+1

That's why I suggested[2] that the installer should at least still
*offer* the old names in the thread where we banned non-ASCII names
(which was not fun to back-patch, but initdb.exe was apparently
crashing all over Türkiye, to the extent that the Windows crash report
investigation team (I think that's who it was?) was compelled to reach
out to our list (at least that's what I guess happened, IDK...).  Does
it not?  It should definitely encourage the use of the modern names by
default for new installations, but perhaps it should have a checkbox
to reveal the old ones?  It could grey out or filter out the non-ASCII
ones that initdb would reject.  +1 for adding some clues about the
relevance to upgrades in the GUI too.

[1] https://commitfest.postgresql.org/patch/3772/
[2] https://www.postgresql.org/message-id/CA%2BhUKGLfrK33XpFXsRcc97a1Qa5Vz1YFEn4GC1vie7yse%3DffPA%40mail.gmail.com


Thank you, Avi, Laurenz, and Thomas, for your inputs on the issue!

@Thomas — what you are suggesting makes complete sense.
We will look into what can be done on this going forward.

For now, we can update the locales dropdown in the installer to list all the BCP names first,
followed by the long (old-style) names — excluding those with non-ASCII characters.

We hope this approach will encourage new users to use only BCP names during installation,
while still providing compatibility with the old-style names for users who want to use pg_upgrade.

Let us know if this sounds reasonable.

On Thu, Apr 10, 2025 at 7:03 PM Manika Singhal
<manika.singhal@enterprisedb.com> wrote:
> For now, we can update the locales dropdown in the installer to list all the BCP names first,
> followed by the long (old-style) names — excluding those with non-ASCII characters.
>
> We hope this approach will encourage new users to use only BCP names during installation,
> while still providing compatibility with the old-style names for users who want to use pg_upgrade.
>
> Let us know if this sounds reasonable.

+1





On Thu, Apr 10, 2025 at 12:45 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Thu, Apr 10, 2025 at 7:03 PM Manika Singhal
<manika.singhal@enterprisedb.com> wrote:
> For now, we can update the locales dropdown in the installer to list all the BCP names first,
> followed by the long (old-style) names — excluding those with non-ASCII characters.
>
> We hope this approach will encourage new users to use only BCP names during installation,
> while still providing compatibility with the old-style names for users who want to use pg_upgrade.
>
> Let us know if this sounds reasonable.

+1


Hi,

As discussed earlier, the updated installers for PostgreSQL 16 and 17
(package revision 2) are now available on our website for download.

These versions include support for selecting the locales list in either of the
following formats during installation:

  • BCP-47 names
  • Legacy long names

Please try these new installers at your convenience and let us know if they
work well for your setup.


Thanks!
Manika Singhal
EDB India
On Thu, Apr 10, 2025 at 11:49:18AM +1200, Thomas Munro wrote:
> On Tue, Apr 8, 2025 at 5:50 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > I think that we should stick with BCP-47 locale names as much as
> > possible.  The problem with the long locale names is not only
> > non-ASCII characters, but that Microsoft keeps changing these names,
> > and PostgreSQL persists them in the catalog, which causes trouble if
> > Windows is upgraded.
> 
> +100

With respect, BCP-47 defines language tags, not locale names.  A locale
name definitely needs a way to identify a language, so BCP-47 can be
part of a locale identifier, but a locale needs more than that: it also
needs a way to identify the codeset used in that locale.

The original post in this thread had postgres using `English_United
States.1252` as the locale name, no part of which is BCP-47-like, but
also BCP-47 has no way of encoding "codepage 1252" as the codeset
because BCP-47 is specifically and only about languages, not codesets.

I'm not actually sure what is the best standard to use for identifying
_locales_ as opposed to _languages_, but BCP-47 isn't it.  POSIX has a
notion of locales, but not registry of locale names and definitions.
POSIX locale naming using BCP-47 language tags and some codeset
identifier seems like the best way, but unlike BCP-47 there is no IANA
registry of locale names, but there is an IANA registry of codesets:

https://www.iana.org/assignments/character-sets/character-sets.xhtml

(which isn't quite a registry of codeset names but it will do) so you
can construct a standard-ish locale name out of language tags and
charset/codeset names.

I believe -correct me if I'm wrong- that the IETF is not interested in
publishing an RFCs/BCPs/STDs regarding locales, nor having an IANA
locale name registry, because the IETF wants the world to use Unicode
and standard Unicode transforms like UTF-8, in which case BCP-47 should
be enough (because the transform should be understood from context).
But the real world still has to deal with non-Unicode codesets and,
therefore with locales.  The CLDR uses some sort of locale name, but
still without a codeset name (because CLDR is about everything about
locales except the codeset name because Unicode), and glibc can be used
as a sort of source of standard-ish POSIX locale names that do include
codeset names.

So if you want to identify _locales_ you might have to either construct
your own locale names out of BCP-47 or CLDR and add a codeset name
subtag, or use glibc's POSIX locale names augmented with Windows
codepage names.

Nico
-- 



On Wed, Apr 16, 2025 at 4:10 AM Nico Williams <nico@cryptonector.com> wrote:
> On Thu, Apr 10, 2025 at 11:49:18AM +1200, Thomas Munro wrote:
> > On Tue, Apr 8, 2025 at 5:50 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > > I think that we should stick with BCP-47 locale names as much as
> > > possible.  The problem with the long locale names is not only
> > > non-ASCII characters, but that Microsoft keeps changing these names,
> > > and PostgreSQL persists them in the catalog, which causes trouble if
> > > Windows is upgraded.
> >
> > +100
>
> With respect, BCP-47 defines language tags, not locale names.  A locale
> name definitely needs a way to identify a language, so BCP-47 can be
> part of a locale identifier, but a locale needs more than that: it also
> needs a way to identify the codeset used in that locale.
>
> The original post in this thread had postgres using `English_United
> States.1252` as the locale name, no part of which is BCP-47-like, but
> also BCP-47 has no way of encoding "codepage 1252" as the codeset
> because BCP-47 is specifically and only about languages, not codesets.
>
> I'm not actually sure what is the best standard to use for identifying
> _locales_ as opposed to _languages_, but BCP-47 isn't it.  POSIX has a
> notion of locales, but not registry of locale names and definitions.
> POSIX locale naming using BCP-47 language tags and some codeset
> identifier seems like the best way, but unlike BCP-47 there is no IANA
> registry of locale names, but there is an IANA registry of codesets:
>
> https://www.iana.org/assignments/character-sets/character-sets.xhtml
>
> (which isn't quite a registry of codeset names but it will do) so you
> can construct a standard-ish locale name out of language tags and
> charset/codeset names.
>
> I believe -correct me if I'm wrong- that the IETF is not interested in
> publishing an RFCs/BCPs/STDs regarding locales, nor having an IANA
> locale name registry, because the IETF wants the world to use Unicode
> and standard Unicode transforms like UTF-8, in which case BCP-47 should
> be enough (because the transform should be understood from context).
> But the real world still has to deal with non-Unicode codesets and,
> therefore with locales.  The CLDR uses some sort of locale name, but
> still without a codeset name (because CLDR is about everything about
> locales except the codeset name because Unicode), and glibc can be used
> as a sort of source of standard-ish POSIX locale names that do include
> codeset names.
>
> So if you want to identify _locales_ you might have to either construct
> your own locale names out of BCP-47 or CLDR and add a codeset name
> subtag, or use glibc's POSIX locale names augmented with Windows
> codepage names.

We're not talking about designing anything of our own or embedding
BCP-47 knowledge into PostgreSQL, we're just agreeing to encourage
Windows users to prefer the OS's modern locale names over its ancient
ones in our tooling. We merely pass the strings around and hope they
still work.  We've been referring to its modern naming system as
BCP-47, though sure, you're right, they're "BCP-47 based".



On Wed, Apr 16, 2025 at 09:11:02AM +1200, Thomas Munro wrote:
> We're not talking about designing anything of our own or embedding
> BCP-47 knowledge into PostgreSQL, we're just agreeing to encourage
> Windows users to prefer the OS's modern locale names over its ancient
> ones in our tooling. We merely pass the strings around and hope they
> still work.  We've been referring to its modern naming system as
> BCP-47, though sure, you're right, they're "BCP-47 based".

Ah, then my apologies for the noise.



On Wed, Apr 16, 2025 at 9:33 AM Nico Williams <nico@cryptonector.com> wrote:
> Ah, then my apologies for the noise.

FWIW I think everything you wrote was interesting and matched my
understanding. Non-UTF-8 text does seem to be almost extinct in the
wild now... *except* on Windows.  I think/hope that will eventually
change too, you can see it starting...  anyway if you're interested in
the broader topic you might like the threads I linked earlier, and
this one below.  Insights on any aspect but especially about Windows
are welcome!

https://www.postgresql.org/message-id/flat/CA%2BhUKGLrx02%3DwT647oYz7x0GW%3DPyQdma0s4U6w9h%3DsuFjcciTw%40mail.gmail.com



On Wed, Apr 16, 2025 at 12:34:29PM +1200, Thomas Munro wrote:
>         [...]. Non-UTF-8 text does seem to be almost extinct in the
> wild now... *except* on Windows.  I think/hope that will eventually

Sigh, yes.

> change too, you can see it starting...  [...]

Yes, MSFT devs seem keen to push the UTF-8 codepage, which is a most
welcome development.

>
https://www.postgresql.org/message-id/flat/CA%2BhUKGLrx02%3DwT647oYz7x0GW%3DPyQdma0s4U6w9h%3DsuFjcciTw%40mail.gmail.com

I replied.

What I might consider is to require that all new PG DBs must run in
UTF-8 locales and/or codepages only, and provide a migration tool for
existing DBs while not breaking them.

Nico
-- 



On Thu, Apr 17, 2025 at 02:18:10PM -0500, Nico Williams wrote:
> What I might consider is to require that all new PG DBs must run in
> UTF-8 locales and/or codepages only, and provide a migration tool for
> existing DBs while not breaking them.

And declare that PG 19 or 20 will not support non-UTF-8 locales /
codepages.



Hi,

I'm struggling with the upgrade process related to this issue, and I hope you can help me.

As mentioned at the beginning of this thread, I'm trying to upgrade cluster version 13.13 to 15.12 on Windows.
My v13.13 cluster uses "English_United States.1252" locale.

When I install the new cluster v15.12, on Windows using the CLI command, I would like to pass this locale using the  "--locale" flag.
This way, the "old" and "new" clusters will have the same locale, and the "pg_upgrade" will work.

The issue is that running the installed using "--locale 'English_United States.1252'" fails with an error "locale specified not supported".
Can you please clarify this for me?

Thanks,
Avi


On Thu, Apr 17, 2025 at 10:31 PM Nico Williams <nico@cryptonector.com> wrote:
On Thu, Apr 17, 2025 at 02:18:10PM -0500, Nico Williams wrote:
> What I might consider is to require that all new PG DBs must run in
> UTF-8 locales and/or codepages only, and provide a migration tool for
> existing DBs while not breaking them.

And declare that PG 19 or 20 will not support non-UTF-8 locales /
codepages.

On Tue, Apr 22, 2025 at 4:28 PM Avi Uziel <avi.uziel@aidoc.com> wrote:
Hi,

I'm struggling with the upgrade process related to this issue, and I hope you can help me.

As mentioned at the beginning of this thread, I'm trying to upgrade cluster version 13.13 to 15.12 on Windows.
My v13.13 cluster uses "English_United States.1252" locale.

When I install the new cluster v15.12, on Windows using the CLI command, I would like to pass this locale using the  "--locale" flag.
This way, the "old" and "new" clusters will have the same locale, and the "pg_upgrade" will work.

The issue is that running the installed using "--locale 'English_United States.1252'" fails with an error "locale specified not supported".
Can you please clarify this for me?

Thanks,
Avi 

Hi Avi,

The `--locale` option is not yet supported by the Windows installers.

As an alternative, you can download the latest 15.12-2 installer from the website
and proceed with the installation using GUI instead of the command line.
During the installation, you will be presented with a locale selection list in both formats
(BCP codes followed by long names). Please select "English, United States" at that time
and you'll be all set. 

JFYI, the upcoming minor releases, scheduled on 8th May, will include support for `--locale`
option. If you are interested, we'd be happy to share a test-installer with you that includes this feature.

Let us know!

Thanks!
Manika Singhal
EDB India
Thanks!

I was able to use "--locale en-US" using Windows installer 15.12. 
So it is supported, just not supporting the "locale long name format"?

Feel free to send us the test installer, we will be happy to assist.

Avi


On Tue, Apr 22, 2025 at 2:30 PM Manika Singhal <manika.singhal@enterprisedb.com> wrote:

On Tue, Apr 22, 2025 at 4:28 PM Avi Uziel <avi.uziel@aidoc.com> wrote:
Hi,

I'm struggling with the upgrade process related to this issue, and I hope you can help me.

As mentioned at the beginning of this thread, I'm trying to upgrade cluster version 13.13 to 15.12 on Windows.
My v13.13 cluster uses "English_United States.1252" locale.

When I install the new cluster v15.12, on Windows using the CLI command, I would like to pass this locale using the  "--locale" flag.
This way, the "old" and "new" clusters will have the same locale, and the "pg_upgrade" will work.

The issue is that running the installed using "--locale 'English_United States.1252'" fails with an error "locale specified not supported".
Can you please clarify this for me?

Thanks,
Avi 

Hi Avi,

The `--locale` option is not yet supported by the Windows installers.

As an alternative, you can download the latest 15.12-2 installer from the website
and proceed with the installation using GUI instead of the command line.
During the installation, you will be presented with a locale selection list in both formats
(BCP codes followed by long names). Please select "English, United States" at that time
and you'll be all set. 

JFYI, the upcoming minor releases, scheduled on 8th May, will include support for `--locale`
option. If you are interested, we'd be happy to share a test-installer with you that includes this feature.

Let us know!

Thanks!
Manika Singhal
EDB India

On Tue, Apr 22, 2025 at 5:09 PM Avi Uziel <avi.uziel@aidoc.com> wrote:
Thanks!

I was able to use "--locale en-US" using Windows installer 15.12. 
So it is supported, just not supporting the "locale long name format"?

Feel free to send us the test installer, we will be happy to assist.

Avi

Hi Avi,


Let us know if this works for you.

Thanks!
Manika Singhal
EDB India