Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Collation version tracking for macOS
Date
Msg-id CA+hUKGJ7dfvu4i_UMOKK9ufG_BcXaYbKNJaPYKA7BH4NyRPTTQ@mail.gmail.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Collation version tracking for macOS
List pgsql-hackers
On Fri, Jun 10, 2022 at 10:29 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Jun 9, 2022 at 2:20 PM Finnerty, Jim <jfinnert@amazon.com> wrote:
> > For example, an alternate syntax might be:
> >
> >     create collation icu63."en-US-x-icu" (provider = icu, locale = 'en-US@colVersion=63');
>
> Why would a user want to specify an ICU version in DDL? Wouldn't that
> break in the event of a dump and reload of the database, for example?
> It also strikes me as being inconsistent with the general philosophy
> for ICU and the broader BCP45 IETF standard, which is "interpret the
> locale string to the best of our ability, never throw an error".
>
> Your proposed syntax already "works" today! You just need to create a
> schema called icu63 -- then the command executes successfully (for
> certain values of successfully).

Jim was proposing the @colVersion=63 part, but the schema part came
from my example upthread.  That was from a real transcript, and I
included that  because the way I've been thinking of this so far has
distinct collation OIDs for the "same" collation from different ICU
libraries, and yet I want them to have the same collname.  That is, I
don't want (say) "en-US-x-icu63" and "en-US-x-icu71"... I thought it'd
be nice to keep using "en-US-x-icu" as we do today, so if there are
two of them they'd *have* to be in different schemas.  That has the
nice property that you can use the search_path to avoid mentioning it.
But I'm not at all wedded to that idea, or any other ideas in this
thread, just trying stuff out...

However, since you mentioned that a simple REINDEX would get you from
one library version to another, I think we're making some completely
different assumptions somewhere along the line, and I don't get your
idea yet.  It sounds like you don't want two different collation OIDs
in that case?

The (vastly too) simplistic way I was thinking of it, if you have a
column with an ICU 63 collation, to switch to ICU 67 you first do some
DDL to add ICU 67 to your system and import 67's collations (creating
new collation OIDs), and then eg ALTER TABLE foo ALTER COLUMN bar TYPE
text COLLATE icu67."en-US-x-icu", which will rebuild your indexes.
That's a big job, and doesn't address how you switch the database
default collation.  None of that is very satisfying, much more thought
needed, but it falls out of the decision to have distinct
icu63."en-US-x-icu" and icu67."en-US-x-icu".  You seem to have some
other idea in mind where the system only knows about one
"en-US-x-icu", but somehow, somewhere else (where?), keeps track of
which indexes were built with ICU 63 and which with ICU 67, which I
don't yet grok.  Or did I misunderstand?



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: better page-level checksums
Next
From: "David G. Johnston"
Date:
Subject: doc: array_length produces null instead of 0