On Thu, Jun 9, 2022 at 5:18 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> However, since you mentioned that a simple REINDEX would get you from
> one library version to another, I think we're making some completely
> different assumptions somewhere along the line, and I don't get your
> idea yet. It sounds like you don't want two different collation OIDs
> in that case?
Not completely sure about the REINDEX behavior, but it's at least an
example of the kind of thing that could be enabled. I'm proposing that
pg_collation-wise collations have the most abstract possible
definitions -- "logical collations", which are decoupled from
"physical collations" that actually describe a particular ICU collator
associated with a particular ICU version (all the information that
keeps how the on-disk structure is organized for a given relfilenode
straight). In other words, the definition of a collation is the user's
own definition. To the user, it's pretty close to (maybe even exactly)
a BCP47 string, now and forever.
You can make arguments against the REINDEX behavior. And maybe those
arguments will turn out to be good arguments. Assuming that they are,
then the solution may just be to have a special option that will make
the REINDEX use the most recent library.
The important point is to make the abstraction as high level as
possible from the point of view of users.
> You seem to have some
> other idea in mind where the system only knows about one
> "en-US-x-icu", but somehow, somewhere else (where?), keeps track of
> which indexes were built with ICU 63 and which with ICU 67, which I
> don't yet grok. Or did I misunderstand?
That's what I meant, yes -- you got it right.
Another way to put it would be to go as far as we can in the direction
of decoupling the concerns that we have as database people from the
concerns of natural language experts. Let's not step on their toes,
and let's avoid having our toes trampled on.
--
Peter Geoghegan