Re: Collation version tracking for macOS - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Collation version tracking for macOS |
Date | |
Msg-id | CA+hUKGJ7dfvu4i_UMOKK9ufG_BcXaYbKNJaPYKA7BH4NyRPTTQ@mail.gmail.com Whole thread Raw |
In response to | Re: Collation version tracking for macOS (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Collation version tracking for macOS
|
List | pgsql-hackers |
On Fri, Jun 10, 2022 at 10:29 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Jun 9, 2022 at 2:20 PM Finnerty, Jim <jfinnert@amazon.com> wrote: > > For example, an alternate syntax might be: > > > > create collation icu63."en-US-x-icu" (provider = icu, locale = 'en-US@colVersion=63'); > > Why would a user want to specify an ICU version in DDL? Wouldn't that > break in the event of a dump and reload of the database, for example? > It also strikes me as being inconsistent with the general philosophy > for ICU and the broader BCP45 IETF standard, which is "interpret the > locale string to the best of our ability, never throw an error". > > Your proposed syntax already "works" today! You just need to create a > schema called icu63 -- then the command executes successfully (for > certain values of successfully). Jim was proposing the @colVersion=63 part, but the schema part came from my example upthread. That was from a real transcript, and I included that because the way I've been thinking of this so far has distinct collation OIDs for the "same" collation from different ICU libraries, and yet I want them to have the same collname. That is, I don't want (say) "en-US-x-icu63" and "en-US-x-icu71"... I thought it'd be nice to keep using "en-US-x-icu" as we do today, so if there are two of them they'd *have* to be in different schemas. That has the nice property that you can use the search_path to avoid mentioning it. But I'm not at all wedded to that idea, or any other ideas in this thread, just trying stuff out... However, since you mentioned that a simple REINDEX would get you from one library version to another, I think we're making some completely different assumptions somewhere along the line, and I don't get your idea yet. It sounds like you don't want two different collation OIDs in that case? The (vastly too) simplistic way I was thinking of it, if you have a column with an ICU 63 collation, to switch to ICU 67 you first do some DDL to add ICU 67 to your system and import 67's collations (creating new collation OIDs), and then eg ALTER TABLE foo ALTER COLUMN bar TYPE text COLLATE icu67."en-US-x-icu", which will rebuild your indexes. That's a big job, and doesn't address how you switch the database default collation. None of that is very satisfying, much more thought needed, but it falls out of the decision to have distinct icu63."en-US-x-icu" and icu67."en-US-x-icu". You seem to have some other idea in mind where the system only knows about one "en-US-x-icu", but somehow, somewhere else (where?), keeps track of which indexes were built with ICU 63 and which with ICU 67, which I don't yet grok. Or did I misunderstand?
pgsql-hackers by date: