Re: Collation versioning - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Collation versioning
Date
Msg-id CAEepm=04PvEdmRmCCcn4c7ydDA=-G=uLe5vDdfJiqp58Jpi8Kw@mail.gmail.com
Whole thread Raw
In response to Re: Collation versioning  (Douglas Doole <dougdoole@gmail.com>)
Responses Re: Collation versioning
Re: Collation versioning
List pgsql-hackers
On Tue, Sep 25, 2018 at 4:26 AM Douglas Doole <dougdoole@gmail.com> wrote:>
> On Sun, Sep 23, 2018 at 2:48 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>> Admittedly that creates a whole can
>> of worms for initdb-time catalog creation, package maintainers' jobs,
>> how long old versions have to be supported and how you upgraded
>> database objects to new ICU versions.
>
>
> Yep. We never come up with a good answer for that before I left IBM. At the time, DB2 only supported 2 or 3 version
ofICU, so they were all shipped as part of the install bundle.
 
>
> Long term, I think the only viable approach to supporting multiple versions of ICU is runtime loading of the
libraries.Then it's up to the system administrator to make sure the necessary versions are installed on the system.
 

I wonder if we would be practically constrained to using the
distro-supplied ICU (by their policies of not allowing packages to
ship their own copies ICU); it seems like it.  I wonder which distros
allow multiple versions of ICU to be installed.  I see that Debian 9.5
only has 57 in the default repo, but the major version is in the
package name (what is the proper term for that kind of versioning?)
and it doesn't declare a conflict with other versions, so that's
promising.  Poking around with nm I noticed also that both the RHEL
and Debian ICU libraries have explicitly versioned symbol names like
"ucol_strcollUTF8_57", which is also promising.  FreeBSD seems to have
used "--disable-renaming" and therefore defines only
"ucol_strcollUTF8"; doh.

This topic is discussed here:
http://userguide.icu-project.org/design#TOC-ICU-Binary-Compatibility:-Using-ICU-as-an-Operating-System-Level-Library

Personally I'm not planning to work on multi-version installation any
time soon, I was just scoping out some basic facts about all this.  I
think the primary problem that affects most of our users is the
shifting-under-your-feet problem, which we now see applies equally to
libc and libicu.

>> Yeah, it seems like ICU is *also* subject to minor changes that happen
>> under your feet, much like libc.  For example maintenance release 60.2
>> (you can't install that at the same time as 60.1, but you can install
>> it at the same time as 59.2).  You'd be linked against libicu.so.60
>> (and thence libicudata.so.60), and it gets upgraded in place when you
>> run the local equivalent of apt-get upgrade.
>
> This always worried me because an unexpected collation change is so painful for a database. And I was never able to
thinkof a way of reliably testing compatibility either because of ICU's ability to reorder and group characters when
collating.

I think the best we can do is to track versions per dependency (ie
record it when the CHECK is created, when the index is created or
rebuilt, ...) and generate loud warnings until you've dealt with each
version dependency.  That's why I've suggested we could consider
sticking it on pg_depend (though I have apparently failed to convince
Stephen so far).  I think something like that is better than the
current collversion design, which punts the problem to the DBA: "hey,
human, there might be some problems, but I don't know where!  Please
tell me when you've fixed them by running ALTER COLLATION ... REFRESH
VERSION!" instead of having the computer track of what actually needs
to be done on an object-by-object basis and update the versions
one-by-one automatically when the problems are resolved.

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: fast default vs triggers
Next
From: Andres Freund
Date:
Subject: Re: auto_explain: Include JIT output if applicable