Re: Collation version tracking for macOS - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Collation version tracking for macOS |
Date | |
Msg-id | CA+hUKGJ_CMXi6G1975Eo7S28Udri-3+FasF0+eNL5RVt4O=J8A@mail.gmail.com Whole thread Raw |
In response to | Re: Collation version tracking for macOS (Jim Nasby <nasbyj@amazon.com>) |
Responses |
Re: Collation version tracking for macOS
|
List | pgsql-hackers |
On Tue, Jun 7, 2022 at 12:10 PM Jim Nasby <nasbyj@amazon.com> wrote: > On 6/3/22 3:58 PM, Tom Lane wrote > > Thomas Munro <thomas.munro@gmail.com> writes: > >> On Sat, Jun 4, 2022 at 7:13 AM Jeremy Schneider > >> <schneider@ardentperf.com> wrote: > >>> It feels to me like we're still not really thinking clearly about this > >>> within the PG community, and that the seriousness of this issue is not > >>> fully understood. > >> FWIW A couple of us tried quite hard to make smarter warnings, and > >> that thread and others discussed a lot of those topics, like the > >> relevance to constraints and so forth. > > I think the real problem here is that the underlying software mostly > > doesn't take this issue seriously. Unfortunately, that leads one to > > the conclusion that we need to maintain our own collation code and > > data (e.g., our own fork of ICU), and that isn't happening. Unlike > > say Oracle, we do not have the manpower; nor do we want to bloat our > > code base that much. > > > > Short of maintaining our own fork, ranting about the imperfections > > of the situation is a waste of time. > The first step to a solution is admitting that the problem exists. We've been discussing this topic for years and I don't think anyone thinks the case is closed... > Ignoring broken backups, segfaults and data corruption as a "rant" > implies that we simply throw in the towel and tell users to suck it up > or switch engines. There are other ways to address this short of the > community doing all the work itself. One simple example would be to > refuse to start if the collation provider has changed since initdb > (which we'd need to allow users to override). Yeah, it's been discussed, but never proposed. The problem is that you need to start up to fix the problem. Another option is not to use affected indexes, but that doesn't help with other forms of the problem (partition constraints, etc). > A more sophisticated > option would be to provide the machinery for supporting multiple > collation libraries. Earlier I mentioned distinct "providers" but I take that back, that's too complicated. Reprising an old idea that comes up each time we talk about this, this time with some more straw-man detail: what about teaching our ICU support to understand "libicu18n.so.71:en" to mean that it should dlopen() that library and use its functions? Or some cleverer, shorter notation. Then it's the user's problem to make sure the right libraries are installed, and it'll fail if they're not. For example, on Debian bookworm right now you can install libicu63, libicu67, libicu71, though only the "current" -dev package, but which I'm sure we can cope with. You're at the mercy of the distro or add-on package repos to keep a lot of versions around, but that seems OK. Maintaining our own fork(s) of ICU would seem like massive overkill and I don't think anyone has suggested that; the question on my mind is whether we could rely on existing packages. Then you'd be exposed only to changes that happen within (say) the ICU 63 package's lifetime... I recall looking into whether that can happen but ... I don't recall the answer.
pgsql-hackers by date: