Thread: Custom Glibc collation version strings under LOCPATH

Custom Glibc collation version strings under LOCPATH

From
Thomas Munro
Date:
Hi,

One way to move to a newer glibc-based Linux distribution but keep the
locales working the same* without keeping the associated zombie C code
alive is to find the source system's collation definition source
files, compile them with the localedef on the target system and point
to the top-level directory with the environment variable LOCPATH.

That runs directly into the naivity of commit d5ac14f9's
gnu_get_libc_version() kludge.  So here's a patch that allows a brave
user of that recompilation technique to drop a custom version string
into a file called one of:

      * $LOCPATH/<collcollate>/LC_COLLATE.version
      * $LOCPATH/<collcollate>/version
      * $LOCPATH/LC_COLLATE.version
      * $LOCPATH/version

This way you can make your custom locales' reported version agree with
wherever they came from to skip those mismatch warnings, at whichever
granularity suits you.  Or you can design some other scheme for
labeling versions.  The attached POC shows this working, though it
lacks documentation for now as I wanted to float the general idea
first.

My preference would be for a tool-supported way for locale components
to report their own version with a new API[1], and I hope that someone
might eventually consider writing and proposing a patch to glibc for
that.  But in the meantime, I figured that users willing to compile
their own locale definitions for PostgreSQL's benefit might want to
drop their own version string into a text file.  The patch has no
effect otherwise, except for a few rare and harmless open() -> ENOENT
system calls if you have defined LOCPATH without supplying a custom
version file.

Returning gnu_get_libc_version() when you set LOCPATH is arguably a
bug and should at the very least be suppressed, I think.

*Of course you have to make sure you know what you're doing.  For
example we learned on this list of some tricky edge cases, mainly
around the treatment of Unicode-order sequences for eg C.UTF-8 which
began as buggy local patches in some distros' glibc C code, but at
least that case has been removed from our problem space by the new
built-in provider.  I'm interested in hearing about other concrete
examples of the locale-recompilation technique failing to be perfect,
and getting to the bottom of them; I have yet to hear of a real world
system that fails amcheck when using locale definitions ported in this
way.

[1] https://www.mail-archive.com/austin-group-l@opengroup.org/msg12849.html

Attachment

Re: Custom Glibc collation version strings under LOCPATH

From
Peter Eisentraut
Date:
On 04.06.25 06:03, Thomas Munro wrote:
> One way to move to a newer glibc-based Linux distribution but keep the
> locales working the same* without keeping the associated zombie C code
> alive is to find the source system's collation definition source
> files, compile them with the localedef on the target system and point
> to the top-level directory with the environment variable LOCPATH.
> 
> That runs directly into the naivity of commit d5ac14f9's
> gnu_get_libc_version() kludge.  So here's a patch that allows a brave
> user of that recompilation technique to drop a custom version string
> into a file called one of:
> 
>        * $LOCPATH/<collcollate>/LC_COLLATE.version
>        * $LOCPATH/<collcollate>/version
>        * $LOCPATH/LC_COLLATE.version
>        * $LOCPATH/version

Nice idea.

The patch looks mostly straightforward.

I wonder why you want to capture LOCPATH early in main.c.  It seems 
sufficient to look it up when needed?




Re: Custom Glibc collation version strings under LOCPATH

From
Thomas Munro
Date:
On Wed, Jun 4, 2025 at 9:17 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> I wonder why you want to capture LOCPATH early in main.c.  It seems
> sufficient to look it up when needed?

Right, it is setenv() that we're trying to avoid.  Updated.

Attachment

Re: Custom Glibc collation version strings under LOCPATH

From
Joe Conway
Date:
On 6/4/25 00:03, Thomas Munro wrote:
> One way to move to a newer glibc-based Linux distribution but keep the
> locales working the same* without keeping the associated zombie C code
> alive is to find the source system's collation definition source
> files, compile them with the localedef on the target system and point
> to the top-level directory with the environment variable LOCPATH.

I don't think this works in all cases because I have seen where sorting 
was affected by C code rather than than data changes.

-- 
Joe Conway
PostgreSQL Contributors Team
Amazon Web Services: https://aws.amazon.com



Re: Custom Glibc collation version strings under LOCPATH

From
Joe Conway
Date:
On 6/4/25 09:52, Joe Conway wrote:
> On 6/4/25 00:03, Thomas Munro wrote:
>> One way to move to a newer glibc-based Linux distribution but keep the
>> locales working the same* without keeping the associated zombie C code
>> alive is to find the source system's collation definition source
>> files, compile them with the localedef on the target system and point
>> to the top-level directory with the environment variable LOCPATH.
> 
> I don't think this works in all cases because I have seen where sorting
> was affected by C code rather than than data changes.

Sorry I missed this part:

>> I'm interested in hearing about other concrete
>> examples of the locale-recompilation technique failing to be perfect,
>> and getting to the bottom of them; I have yet to hear of a real world
>> system that fails amcheck when using locale definitions ported in this
>> way.

If you go from anything pre-glibc-2.21 to post-glibc-2.21 I think you 
will find that even with the same data files you get a different sort. 
The same patch that caused the performance regression [1] (still present 
in up to date glibc) also cause changes in sort order via C code alone.


[1] https://sourceware.org/bugzilla/show_bug.cgi?id=18441

-- 
Joe Conway
PostgreSQL Contributors Team
Amazon Web Services: https://aws.amazon.com



Re: Custom Glibc collation version strings under LOCPATH

From
Thomas Munro
Date:
On Thu, Jun 5, 2025 at 3:44 AM Joe Conway <mail@joeconway.com> wrote:
> On 6/4/25 09:52, Joe Conway wrote:
> > On 6/4/25 00:03, Thomas Munro wrote:
> >> I'm interested in hearing about other concrete
> >> examples of the locale-recompilation technique failing to be perfect,
> >> and getting to the bottom of them; I have yet to hear of a real world
> >> system that fails amcheck when using locale definitions ported in this
> >> way.
>
> If you go from anything pre-glibc-2.21 to post-glibc-2.21 I think you
> will find that even with the same data files you get a different sort.
> The same patch that caused the performance regression [1] (still present
> in up to date glibc) also cause changes in sort order via C code alone.

Will try.  And BTW I fully understand that your work on running parts
of pinned old glibc libraries is a bug-perfect solution to this.  But
I also want to explore other trade-off positions, for users who don't
want to run unmaintained C code.  In exchange for that paranoia you
have C code changes, intentional or unintentional, and I'd really like
to understand them better...  One thing that is definitely out of the
question is moving the compiled LC_COLLATE files between glibc
versions (the binary format clearly changes, sometimes it apparently
work, sometimes it doesn't at all).  That leads to the idea of
recompiling with localedef.  The source formats are standardised by
POSIX and *should* have the same meaning to any system, so now maybe
we're only talking about bugs (in theory, you should even be able to
move the source between unrelated Unixen, but I only care about glibc
here, and I have no doubt that there are extensions and quirks so
reality may fail to live up to the theory completely).  I've
personally analysed only one such case and chased it all the way down,
which is the support for strict codepoint ordering and the non-strict
local fudges that Debian et al shipped in some version range, so we
can't even really blame it on glibc, and yet it is/was in the wild so
we can't ignore it (thanks to Jeff for making that one irrelevant).
Finding more cases probably involves running something a little like
Jeremy's torture tests across a huge gallery of versions and
combinations of cross-version recompiled definitions.  Or something
like that...



Re: Custom Glibc collation version strings under LOCPATH

From
Joe Conway
Date:
On 6/4/25 19:35, Thomas Munro wrote:
> On Thu, Jun 5, 2025 at 3:44 AM Joe Conway <mail@joeconway.com> wrote:
>> If you go from anything pre-glibc-2.21 to post-glibc-2.21 I think you
>> will find that even with the same data files you get a different sort.
>> The same patch that caused the performance regression [1] (still present
>> in up to date glibc) also cause changes in sort order via C code alone.

> Finding more cases probably involves running something a little like
> Jeremy's torture tests across a huge gallery of versions and
> combinations of cross-version recompiled definitions.  Or something
> like that...

Sounds like great fun!

;-)

-- 
Joe Conway
PostgreSQL Contributors Team
Amazon Web Services: https://aws.amazon.com