Thread: Custom Glibc collation version strings under LOCPATH
Hi, One way to move to a newer glibc-based Linux distribution but keep the locales working the same* without keeping the associated zombie C code alive is to find the source system's collation definition source files, compile them with the localedef on the target system and point to the top-level directory with the environment variable LOCPATH. That runs directly into the naivity of commit d5ac14f9's gnu_get_libc_version() kludge. So here's a patch that allows a brave user of that recompilation technique to drop a custom version string into a file called one of: * $LOCPATH/<collcollate>/LC_COLLATE.version * $LOCPATH/<collcollate>/version * $LOCPATH/LC_COLLATE.version * $LOCPATH/version This way you can make your custom locales' reported version agree with wherever they came from to skip those mismatch warnings, at whichever granularity suits you. Or you can design some other scheme for labeling versions. The attached POC shows this working, though it lacks documentation for now as I wanted to float the general idea first. My preference would be for a tool-supported way for locale components to report their own version with a new API[1], and I hope that someone might eventually consider writing and proposing a patch to glibc for that. But in the meantime, I figured that users willing to compile their own locale definitions for PostgreSQL's benefit might want to drop their own version string into a text file. The patch has no effect otherwise, except for a few rare and harmless open() -> ENOENT system calls if you have defined LOCPATH without supplying a custom version file. Returning gnu_get_libc_version() when you set LOCPATH is arguably a bug and should at the very least be suppressed, I think. *Of course you have to make sure you know what you're doing. For example we learned on this list of some tricky edge cases, mainly around the treatment of Unicode-order sequences for eg C.UTF-8 which began as buggy local patches in some distros' glibc C code, but at least that case has been removed from our problem space by the new built-in provider. I'm interested in hearing about other concrete examples of the locale-recompilation technique failing to be perfect, and getting to the bottom of them; I have yet to hear of a real world system that fails amcheck when using locale definitions ported in this way. [1] https://www.mail-archive.com/austin-group-l@opengroup.org/msg12849.html
Attachment
On 04.06.25 06:03, Thomas Munro wrote: > One way to move to a newer glibc-based Linux distribution but keep the > locales working the same* without keeping the associated zombie C code > alive is to find the source system's collation definition source > files, compile them with the localedef on the target system and point > to the top-level directory with the environment variable LOCPATH. > > That runs directly into the naivity of commit d5ac14f9's > gnu_get_libc_version() kludge. So here's a patch that allows a brave > user of that recompilation technique to drop a custom version string > into a file called one of: > > * $LOCPATH/<collcollate>/LC_COLLATE.version > * $LOCPATH/<collcollate>/version > * $LOCPATH/LC_COLLATE.version > * $LOCPATH/version Nice idea. The patch looks mostly straightforward. I wonder why you want to capture LOCPATH early in main.c. It seems sufficient to look it up when needed?
On Wed, Jun 4, 2025 at 9:17 PM Peter Eisentraut <peter@eisentraut.org> wrote: > I wonder why you want to capture LOCPATH early in main.c. It seems > sufficient to look it up when needed? Right, it is setenv() that we're trying to avoid. Updated.
Attachment
On 6/4/25 00:03, Thomas Munro wrote: > One way to move to a newer glibc-based Linux distribution but keep the > locales working the same* without keeping the associated zombie C code > alive is to find the source system's collation definition source > files, compile them with the localedef on the target system and point > to the top-level directory with the environment variable LOCPATH. I don't think this works in all cases because I have seen where sorting was affected by C code rather than than data changes. -- Joe Conway PostgreSQL Contributors Team Amazon Web Services: https://aws.amazon.com
On 6/4/25 09:52, Joe Conway wrote: > On 6/4/25 00:03, Thomas Munro wrote: >> One way to move to a newer glibc-based Linux distribution but keep the >> locales working the same* without keeping the associated zombie C code >> alive is to find the source system's collation definition source >> files, compile them with the localedef on the target system and point >> to the top-level directory with the environment variable LOCPATH. > > I don't think this works in all cases because I have seen where sorting > was affected by C code rather than than data changes. Sorry I missed this part: >> I'm interested in hearing about other concrete >> examples of the locale-recompilation technique failing to be perfect, >> and getting to the bottom of them; I have yet to hear of a real world >> system that fails amcheck when using locale definitions ported in this >> way. If you go from anything pre-glibc-2.21 to post-glibc-2.21 I think you will find that even with the same data files you get a different sort. The same patch that caused the performance regression [1] (still present in up to date glibc) also cause changes in sort order via C code alone. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=18441 -- Joe Conway PostgreSQL Contributors Team Amazon Web Services: https://aws.amazon.com
On Thu, Jun 5, 2025 at 3:44 AM Joe Conway <mail@joeconway.com> wrote: > On 6/4/25 09:52, Joe Conway wrote: > > On 6/4/25 00:03, Thomas Munro wrote: > >> I'm interested in hearing about other concrete > >> examples of the locale-recompilation technique failing to be perfect, > >> and getting to the bottom of them; I have yet to hear of a real world > >> system that fails amcheck when using locale definitions ported in this > >> way. > > If you go from anything pre-glibc-2.21 to post-glibc-2.21 I think you > will find that even with the same data files you get a different sort. > The same patch that caused the performance regression [1] (still present > in up to date glibc) also cause changes in sort order via C code alone. Will try. And BTW I fully understand that your work on running parts of pinned old glibc libraries is a bug-perfect solution to this. But I also want to explore other trade-off positions, for users who don't want to run unmaintained C code. In exchange for that paranoia you have C code changes, intentional or unintentional, and I'd really like to understand them better... One thing that is definitely out of the question is moving the compiled LC_COLLATE files between glibc versions (the binary format clearly changes, sometimes it apparently work, sometimes it doesn't at all). That leads to the idea of recompiling with localedef. The source formats are standardised by POSIX and *should* have the same meaning to any system, so now maybe we're only talking about bugs (in theory, you should even be able to move the source between unrelated Unixen, but I only care about glibc here, and I have no doubt that there are extensions and quirks so reality may fail to live up to the theory completely). I've personally analysed only one such case and chased it all the way down, which is the support for strict codepoint ordering and the non-strict local fudges that Debian et al shipped in some version range, so we can't even really blame it on glibc, and yet it is/was in the wild so we can't ignore it (thanks to Jeff for making that one irrelevant). Finding more cases probably involves running something a little like Jeremy's torture tests across a huge gallery of versions and combinations of cross-version recompiled definitions. Or something like that...
On 6/4/25 19:35, Thomas Munro wrote: > On Thu, Jun 5, 2025 at 3:44 AM Joe Conway <mail@joeconway.com> wrote: >> If you go from anything pre-glibc-2.21 to post-glibc-2.21 I think you >> will find that even with the same data files you get a different sort. >> The same patch that caused the performance regression [1] (still present >> in up to date glibc) also cause changes in sort order via C code alone. > Finding more cases probably involves running something a little like > Jeremy's torture tests across a huge gallery of versions and > combinations of cross-version recompiled definitions. Or something > like that... Sounds like great fun! ;-) -- Joe Conway PostgreSQL Contributors Team Amazon Web Services: https://aws.amazon.com