Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Collation version tracking for macOS
Date
Msg-id CA+hUKGJ_CMXi6G1975Eo7S28Udri-3+FasF0+eNL5RVt4O=J8A@mail.gmail.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Jim Nasby <nasbyj@amazon.com>)
Responses Re: Collation version tracking for macOS
List pgsql-hackers
On Tue, Jun 7, 2022 at 12:10 PM Jim Nasby <nasbyj@amazon.com> wrote:
> On 6/3/22 3:58 PM, Tom Lane wrote
> > Thomas Munro <thomas.munro@gmail.com> writes:
> >> On Sat, Jun 4, 2022 at 7:13 AM Jeremy Schneider
> >> <schneider@ardentperf.com> wrote:
> >>> It feels to me like we're still not really thinking clearly about this
> >>> within the PG community, and that the seriousness of this issue is not
> >>> fully understood.
> >> FWIW A couple of us tried quite hard to make smarter warnings, and
> >> that thread and others discussed a lot of those topics, like the
> >> relevance to constraints and so forth.
> > I think the real problem here is that the underlying software mostly
> > doesn't take this issue seriously.  Unfortunately, that leads one to
> > the conclusion that we need to maintain our own collation code and
> > data (e.g., our own fork of ICU), and that isn't happening.  Unlike
> > say Oracle, we do not have the manpower; nor do we want to bloat our
> > code base that much.
> >
> > Short of maintaining our own fork, ranting about the imperfections
> > of the situation is a waste of time.
> The first step to a solution is admitting that the problem exists.

We've been discussing this topic for years and I don't think anyone
thinks the case is closed...

> Ignoring broken backups, segfaults and data corruption as a "rant"
> implies that we simply throw in the towel and tell users to suck it up
> or switch engines. There are other ways to address this short of the
> community doing all the work itself. One simple example would be to
> refuse to start if the collation provider has changed since initdb
> (which we'd need to allow users to override).

Yeah, it's been discussed, but never proposed.  The problem is that
you need to start up to fix the problem.  Another option is not to use
affected indexes, but that doesn't help with other forms of the
problem (partition constraints, etc).

> A more sophisticated
> option would be to provide the machinery for supporting multiple
> collation libraries.

Earlier I mentioned distinct "providers" but I take that back, that's
too complicated.  Reprising an old idea that comes up each time we
talk about this, this time with some more straw-man detail: what about
teaching our ICU support to understand "libicu18n.so.71:en" to mean
that it should dlopen() that library and use its functions?  Or some
cleverer, shorter notation.  Then it's the user's problem to make sure
the right libraries are installed, and it'll fail if they're not.  For
example, on Debian bookworm right now you can install libicu63,
libicu67, libicu71, though only the "current" -dev package, but which
I'm sure we can cope with.  You're at the mercy of the distro or
add-on package repos to keep a lot of versions around, but that seems
OK.  Maintaining our own fork(s) of ICU would seem like massive
overkill and I don't think anyone has suggested that; the question on
my mind is  whether we could rely on existing packages.  Then you'd be
exposed only to changes that happen within (say) the ICU 63 package's
lifetime... I recall looking into whether that can happen but ... I
don't recall the answer.



pgsql-hackers by date:

Previous
From: Jeremy Schneider
Date:
Subject: Re: Collation version tracking for macOS
Next
From: Ranier Vilela
Date:
Subject: Re: Reducing Memory Consumption (aset and generation)