On Thu, Jun 5, 2025 at 3:44 AM Joe Conway <mail@joeconway.com> wrote:
> On 6/4/25 09:52, Joe Conway wrote:
> > On 6/4/25 00:03, Thomas Munro wrote:
> >> I'm interested in hearing about other concrete
> >> examples of the locale-recompilation technique failing to be perfect,
> >> and getting to the bottom of them; I have yet to hear of a real world
> >> system that fails amcheck when using locale definitions ported in this
> >> way.
>
> If you go from anything pre-glibc-2.21 to post-glibc-2.21 I think you
> will find that even with the same data files you get a different sort.
> The same patch that caused the performance regression [1] (still present
> in up to date glibc) also cause changes in sort order via C code alone.
Will try. And BTW I fully understand that your work on running parts
of pinned old glibc libraries is a bug-perfect solution to this. But
I also want to explore other trade-off positions, for users who don't
want to run unmaintained C code. In exchange for that paranoia you
have C code changes, intentional or unintentional, and I'd really like
to understand them better... One thing that is definitely out of the
question is moving the compiled LC_COLLATE files between glibc
versions (the binary format clearly changes, sometimes it apparently
work, sometimes it doesn't at all). That leads to the idea of
recompiling with localedef. The source formats are standardised by
POSIX and *should* have the same meaning to any system, so now maybe
we're only talking about bugs (in theory, you should even be able to
move the source between unrelated Unixen, but I only care about glibc
here, and I have no doubt that there are extensions and quirks so
reality may fail to live up to the theory completely). I've
personally analysed only one such case and chased it all the way down,
which is the support for strict codepoint ordering and the non-strict
local fudges that Debian et al shipped in some version range, so we
can't even really blame it on glibc, and yet it is/was in the wild so
we can't ignore it (thanks to Jeff for making that one irrelevant).
Finding more cases probably involves running something a little like
Jeremy's torture tests across a huge gallery of versions and
combinations of cross-version recompiled definitions. Or something
like that...