Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows - Mailing list pgsql-bugs
From | Thomas Munro |
---|---|
Subject | Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows |
Date | |
Msg-id | CA+hUKGKcbtDSvTySZgmpm2GyzBwTKnEQsaYpgbT6zw0-1qLz6Q@mail.gmail.com Whole thread Raw |
In response to | Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows (Laurenz Albe <laurenz.albe@cybertec.at>) |
Responses |
Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows
Re: PostgreSQL v15.12 fails to perform PG_UPGRADE from v13 and v9 on Windows |
List | pgsql-bugs |
On Tue, Apr 8, 2025 at 5:50 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > On Mon, 2025-04-07 at 17:59 +0300, Avi Uziel wrote: > > I anticipate that users could face an upgrade issue if they installed a cluster on a Windows > > machine using the default locale before this change and then attempt to upgrade to a release > > that includes this commit. > > Yes, I think this will cause more trouble. > > I think that we should stick with BCP-47 locale names as much as possible. The problem with > the long locale names is not only non-ASCII characters, but that Microsoft keeps changing these > names, and PostgreSQL persists them in the catalog, which causes trouble if Windows is upgraded. +100 I've been saying for a while[1] that initdb should default to BCP-47 too. But it seems likely that the entire PostgreSQL-on-Windows community uses the popular EDB Installer, not command line tools? It doesn't use initdb's default anyway, so changes to initdb were never going to start the migration of the world's PostgreSQL-on-Windows fleet to BCP-47. It took President Erdoğan and the EDB installer team to kick it off. Despite this upgrade wrinkle, which I'm sure we can smooth out, I'm really happy to see it finally happening! Historically in our commit log, tree and mailing lists we've kvetched about unstable locale names on that OS, but the please-don't-store-the-old-style-names-anywhere-do-this-other-thing-instead warning in the documentation presumably started with Vista a couple of decades ago. Unfortunately that shipped around the same time as the PostgreSQL-on-Windows port, and of course it targeted older releases for some time. Better late than never. (As an aside: I think our kvetching wasn't unwarranted, it just didn't tackle the root problem. Perhaps unsurprisingly, because they changed that in all the Windows native APIs, but setlocale() continued and continues to expose the pre-Vista names, creating an asymmetry and incidentally also exposing non-C standard conforming behaviour due to char/whcar_t schizophrenia. I could guess that setlocale() and friends only exist for minimal/partial conformance anyway, and some kind of no-changes-there-unless-we-have-to policy was took priority? After all, few are interested in non-MT-safe code on that platform. That's all OK once you understand it and code around it, and it's a solid clue that French_France probably accesses the same underlying locale definition as fr-FR, but I've been hesitant to write any code myself that relies on that without a written contract. FWIW I think our locale name situation is finally pretty close to where it should be now on Windows, partially as a byproduct of our own renewed interest in MT-safe code. I also hope we can do native UTF-8 for v19 (perhaps creating a future upgrade hump for fr-FR -> fr-FR.UTF-8, which it would have been nice to fold into this one and maybe we should consider that some time soon). And also tackle some related encoding problems in other places, namely argv, environ, path names, shared database objects and early-stage pq protocol. Some of that requires slightly different thinking on Windows due to its wchar_t model. I've written long emails about my learnings there eg some library calls expect setlocale() encoded C strings, others the ACP, some we flipped from one camp to the other by making our own wrappers/emulations of libc calls, and a few corners can only be sanely approached by resorting to wchar_t APIs. Basically there was zero hope of a non-Windows hacker pulling that sort of thing off before we had good CI, and it helps a lot to view it as part of larger project to nail down the semantics of multi-encoding clusters on all OSes, with only a few special concerns for Windows... anyway, many POC patches exist, hacking continues...) > I can see three potential ways to deal with that: > > 1. Only when creating a cluster for upgrade, use the locale names that the old cluster uses. > > That is difficult, because it requires to connect to that cluster, as the information is > only in catalog tables. Also, if there are several clusters, which one to use? > > 2. Before upgrading a cluster, update the catalog tables of the old cluster with the > corresponding BCP-47 name. > > That would be a good way into a world with only BCP-47 locale names. > > On the one hand, it would be convenient to have pg_upgrade do this automatically, but if > the upgrade fails for whatever reason, your original cluster got modified, which doesn't > feel right. > > Perhaps the Windows binaries could come with an extra tool for such a change, and perhaps > the installer could suggest running it before an upgrade. I was hesitant to suggest this approach in the many threads about this mess even though I can see the appeal of a silent upgrade path. If the EDB tooling or any potential core PostgreSQL tooling had a built-in assumption that the pre-Vista locales are 100% compatible with the Vista locales without any official documentation to point to, that'd be based on a guess or unsubstantiated claim AFAIK and could turn out wrong in some subtle way. Even though you can sort of deduce/guess that the old-style names are really just the display names (I forget the term) that appear in the structs that GetLocales() (or whatever it's called) spits out, I would prefer to leave catalogue-clobbering shenanigans up to brave end users, even though it'd probably work out OK in practice. Call me a chicken, but if existing clusters aren't broken, why fix them? Any human or script that does such things should probably also run REINDEX without proof positive that it's not needed (amcheck, or documented evidence that it's not necessary), or take that risk themselves, and technically it is not only indexes that can depend on locale definitions (also CHECK, and more things I've forgotten since the failed attempt to model those dependencies a few releases back...). It's a nice goal, but my gut feeling was that it should not be tangled up with upgrades. The only users who *can't* use the old style names without an extra step are those actually affected by country renames like Türkiye. The path of least resistance for them seems to be to follow the advice on creating a "Turkish_Turkey" locale using the Locale Builder tool, and then just keep using that name across the upgrade. Affected users presumably already had to do that to keep their old cluster running before they decided to upgrade anyway, so why force them to tackle a transition to BCP-47 *during* an upgrade? > 3. A minimal solution would be to equip the Windows installer with some guidance for the user > at the point where the locale is chosen. The BCP-47 locale name would be suggested, and > a text could point out "If you plan to upgrade a cluster that was created with long locale > names, please select the same locale names for the new cluster". +1 That's why I suggested[2] that the installer should at least still *offer* the old names in the thread where we banned non-ASCII names (which was not fun to back-patch, but initdb.exe was apparently crashing all over Türkiye, to the extent that the Windows crash report investigation team (I think that's who it was?) was compelled to reach out to our list (at least that's what I guess happened, IDK...). Does it not? It should definitely encourage the use of the modern names by default for new installations, but perhaps it should have a checkbox to reveal the old ones? It could grey out or filter out the non-ASCII ones that initdb would reject. +1 for adding some clues about the relevance to upgrades in the GUI too. [1] https://commitfest.postgresql.org/patch/3772/ [2] https://www.postgresql.org/message-id/CA%2BhUKGLfrK33XpFXsRcc97a1Qa5Vz1YFEn4GC1vie7yse%3DffPA%40mail.gmail.com
pgsql-bugs by date: