Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Collation version tracking for macOS
Date
Msg-id 9f8e9b5a3352478d4cf7d6c0a5dd7e82496be4b6.camel@j-davis.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Collation version tracking for macOS
List pgsql-hackers
On Sat, 2022-10-22 at 14:22 +1300, Thomas Munro wrote:
> Problem 2:  If ICU 67 ever decides to report a different version for
> a
> given collation (would it ever do that?  I don't expect so, but ...),
> we'd be unable to open the collation with the search-by-collversion
> design, and potentially the database.  What is a user supposed to do
> then?  Presumably our error/hint for that would be "please insert the
> correct ICU library into drive A", but now there is no correct
> library

Let's say that Postgres is compiled against version 67.X, and the
sysadmin upgrades the ICU package to 67.Y, which reports a different
collation version for some locale.

Your current patch makes this impossible for the administrator to fix,
because there's no way to have two different libraries loaded with the
same major version number, so it will always pick the compiled-in ICU.
The user will be forced to accept the new version of the collation, see
WARNINGs in their logs, and possibly corrupt their indexes.

Search-by-collversion would still be frustrating for the admin, but at
least it would be possible to fix by compiling their own 67.X and
asking Postgres to search that library, too. We could make it slightly
more friendly by having an error that reports the libraries searched
and the collation versions found, if none of the versions match. We can
have a GUC that controls whether a failure to find the right version is
a WARNING or an ERROR.

On Sat, 2022-11-19 at 07:38 +1300, Thomas Munro wrote:
> >   * We'll need some clearer instructions on how to build/install
> > extra
> > ICU versions that might not be provided by the distribution
> > packaging.
> > For instance, I got a cryptic error until I used --enable-rpath,
> > which
> > might not be obvious to all users.
>
> Suggestions welcome.  No docs at all yet...

I tried to write up some docs. It's hard to explain why we are exposing
to the user the collation version and the library version in these
different ways, and what effects they have.

The current patch feels like it hasn't decided whether the collation
version is ucol_getVersion() (collversion) or u_getVersion() (library
version). The collversion is more prominent in the UI (with its own
syntax), yet it's just a cross-check for whether to issue a WARNING or
not; while the library version is hidden in the locale field and it
actually decides which symbol is called.

>
>
> Yeah.  I just don't like the way it *appears* to be doing something
> clever, but
> it doesn't solve any fundamental problem at all because the
> collversion
> information is under human control and so it's really doing something
> stupid.

I assume by "human control" you mean "ALTER COLLATION ... REFRESH
VERSION". I agree that relying on the admin's declaration is dubious,
especially when we provide no good advice on how to actually do that
safely.

But I don't see what using the library version instead buys us here,
except that library version is part of the LOCALE, and there's no ALTER
command for that. You could just as easily deprecate/eliminate the
ALTER COLLATION REFRESH VERSION, and then say that the collversion is
out of human control, too.

By introducing multiple libraries, I think we need to change that
syntax anyway, to be something like:

   ALTER COLLATION ... SET VERSION TO '...'

or even:

   ALTER COLLATION ... FORCE VERSION TO '...'

> Hence desire to build something that at least admits that it's
> primitive and
> just gives you some controls, in a first version.

Using either the library version or the collation version seems
reasonably simple to me. But from a documentation and usability
standpoint, the way they are currently mixed seems confusing.



--
Jeff Davis
PostgreSQL Contributor Team - AWS





pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Split index and table statistics into different types of stats
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: wake up logical workers after ALTER SUBSCRIPTION