Re: Collation version tracking for macOS - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Collation version tracking for macOS |
Date | |
Msg-id | CA+hUKGJMndLA-uX8N2RmH6AnNB1-W3mPvKE1vuL1BtS9RicVfg@mail.gmail.com Whole thread Raw |
In response to | Re: Collation version tracking for macOS ("Finnerty, Jim" <jfinnert@amazon.com>) |
Responses |
Re: Collation version tracking for macOS
|
List | pgsql-hackers |
On Fri, Jun 10, 2022 at 9:20 AM Finnerty, Jim <jfinnert@amazon.com> wrote: > Specifying the library name before the language-country code with a new separator (":") as you suggested below has somebenefits. One of the reasons for putting some representation of desired library into the colliculocale column (rather than, say, adding a new column pg_collation) is that I think we'd also want to be able to put that into daticulocale (for the database default collation, when using ICU). But really I just did that because it was easy... perhaps, both pg_collation and pg_database could gain a new column, and that would be a little more pleasing from a schema design point of view (1NF atomicity, and it's a sort of foreign key, or at least it would be if there were another catalog to list library versions...)? > Did you consider making the collation version just another collation attribute, such as colStrength, colCaseLevel, etc.? > For example, an alternate syntax might be: > > create collation icu63."en-US-x-icu" (provider = icu, locale = 'en-US@colVersion=63'); Hmm, I hadn't considered that. (I wouldn't call it "col" version BTW, it's a library version, and we don't want to overload our terminology for collation version. We'd still be on the look out for collversion changes coming from a single library's minor version changing, for example an apt-get upgrade can replace the .63 files, which on most systems are symlinks to .63.1, .63.2 etc. ☠️) > Was the concern that ICU might redefine a new collation property with the same name in a different and incompatible way(we might work with the ICU developers to agree on what it should be), or that a version is just not the same kind ofcollation property as the other collation properties? Well my first impression is that we don't really own that namespace, and since we're using this to decide which library to route calls to, it seems nicer to put it at a "higher level" than those properties. So I'd prefer something like "63:en-US", or 63 in a new column. > (in the example above, I'm assuming that for provider = icu, we could translate '63' into 'libicui18n.so.63' automatically.) Yeah. My patch that jams a library name in there was just the fastest way I could think of to get something off the ground to test whether I could route calls to different libraries (yes!), though at one moment I thought it wasn't terrible. But aside from any aesthetic complaints about that way of doing it, it turns out not to be enough: we need to dlopen() two different libraries, because we also need some ctype-ish functions from this guy: $ nm -D -C /usr/lib/x86_64-linux-gnu/libicuuc.so.63.1 | grep u_strToUpper 00000000000d22c0 T u_strToUpper_63 I guess we probably want to just put "63" somewhere in pg_collation, as you say. But then, teaching PostgreSQL how to expand that to a name that is platform/packaging dependent seems bad. The variations would probably be minor; on a Mac it's .dylib, on AIX it may be .a, and the .63 convention may not be universal, I dunno, but some systems might need absolute paths (depending on ld.so.conf etc), but that's all stuff that I think an administrator should care about, not us. Perhaps there could be a new catalog table just for that. So far I have imagined there would still be one special ICU library linked at build time, which doesn't need to be dlopen'd, and works automatically without administrators having to declare it. So a system that has one linked-in library version 67, and then has two extras that have been added by an administrator running some new DDL commands might have: postgres=# select * from pg_icu_library order by version; version | libicuuc | libicui18n ---------+----------------+------------------ 58 | libicuuc.so.58 | libicui18n.so.58 63 | libicuuc.so.63 | libicui18n.so.63 67 | | (3 rows) Suppose you pg_upgrade to something that is linked against 71. Perhaps you'd need to tell it how to dlopen 67 before you can open any collations with that library, but once you've done that your collation-dependent partition constraints etc should all hold. I dunno, lots of problems to figure out here, including quite broad ones about various migration problems. I haven't understood what Peter G is suggesting about how upgrades might work, so I'll go and try to do that...
pgsql-hackers by date: