Custom Glibc collation version strings under LOCPATH - Mailing list pgsql-hackers

From Thomas Munro
Subject Custom Glibc collation version strings under LOCPATH
Date
Msg-id CA+hUKG+UngA4H=Ytsz6iiz_xAzqG3JX9eC9CBSzpubfRz9gYeQ@mail.gmail.com
Whole thread Raw
Responses Re: Custom Glibc collation version strings under LOCPATH
Re: Custom Glibc collation version strings under LOCPATH
List pgsql-hackers
Hi,

One way to move to a newer glibc-based Linux distribution but keep the
locales working the same* without keeping the associated zombie C code
alive is to find the source system's collation definition source
files, compile them with the localedef on the target system and point
to the top-level directory with the environment variable LOCPATH.

That runs directly into the naivity of commit d5ac14f9's
gnu_get_libc_version() kludge.  So here's a patch that allows a brave
user of that recompilation technique to drop a custom version string
into a file called one of:

      * $LOCPATH/<collcollate>/LC_COLLATE.version
      * $LOCPATH/<collcollate>/version
      * $LOCPATH/LC_COLLATE.version
      * $LOCPATH/version

This way you can make your custom locales' reported version agree with
wherever they came from to skip those mismatch warnings, at whichever
granularity suits you.  Or you can design some other scheme for
labeling versions.  The attached POC shows this working, though it
lacks documentation for now as I wanted to float the general idea
first.

My preference would be for a tool-supported way for locale components
to report their own version with a new API[1], and I hope that someone
might eventually consider writing and proposing a patch to glibc for
that.  But in the meantime, I figured that users willing to compile
their own locale definitions for PostgreSQL's benefit might want to
drop their own version string into a text file.  The patch has no
effect otherwise, except for a few rare and harmless open() -> ENOENT
system calls if you have defined LOCPATH without supplying a custom
version file.

Returning gnu_get_libc_version() when you set LOCPATH is arguably a
bug and should at the very least be suppressed, I think.

*Of course you have to make sure you know what you're doing.  For
example we learned on this list of some tricky edge cases, mainly
around the treatment of Unicode-order sequences for eg C.UTF-8 which
began as buggy local patches in some distros' glibc C code, but at
least that case has been removed from our problem space by the new
built-in provider.  I'm interested in hearing about other concrete
examples of the locale-recompilation technique failing to be perfect,
and getting to the bottom of them; I have yet to hear of a real world
system that fails amcheck when using locale definitions ported in this
way.

[1] https://www.mail-archive.com/austin-group-l@opengroup.org/msg12849.html

Attachment

pgsql-hackers by date:

Previous
From: Shubham Khanna
Date:
Subject: Enhance pg_createsubscriber to create required standby.
Next
From: Fujii Masao
Date:
Subject: Re: pgsql: postgres_fdw: Inherit the local transaction's access/deferrable