Re: Collation version tracking for macOS - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Collation version tracking for macOS |
Date | |
Msg-id | CA+hUKGKvu_QQyUHk-yOg-LvDajPH-N=JLcB2Abw4+k5mK1KZUg@mail.gmail.com Whole thread Raw |
In response to | Re: Collation version tracking for macOS (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Collation version tracking for macOS
|
List | pgsql-hackers |
On Sat, Jun 11, 2022 at 2:29 PM Peter Geoghegan <pg@bowt.ie> wrote: > The special REINDEX (or whatever) won't work as an atomic > operation...but that doesn't mean that the system as a whole will have > a mix of old and new physical collations forever, or even for very > long. So while everything still has to work correctly, mediocre > performance with certain kinds of plan shapes might be okay. Yeah. And if you don't like the negative effects of a system in transition, you could also create new otherwise identical indexes, and then only drop the old ones once that's done, and add enough brains to keep everything working nicely on the old indexes until enough of the new indexes exist. Or something. I was thinking that could also be true for the "distinct collations" concept, only with different problems to solve... > As you kind of said yourself yesterday, "time travelling collations" > could naturally have an invariant that worked at the index/constraint > level (not the collation level): any given index needs to use only one > underlying ICU version at the same time, no matter what. The invariant > could perhaps be leveraged in the planner -- perhaps we start out with > a working assumption that *every* index is on the newer ICU version > (when at least one index is on the lastest and "now current" version), > and work backwards by excluding indexes that still have old physical > collations. Defining the problem as a problem with old > indexes/constraints only seems like it might make things a lot easier. Yes, that (posited) invariant was an enabling realisation for the (unfinished, but IMHO useful to grok) v2 patch. The disabling realisation that stopped me from finishing it was that I doubt my ability to find all the right places to wrap with "pg_icu_activate_major_version(X)" and "pg_icu_activate_major_version(-1)", and thus the whole approach. Do you know where to put all the ICU version switching regions, and how to verify that they cover exactly all the right code, but don't leak into any of the wrong code, and do you know where to store/retrieve X? It feels... flimsy to me, but how else could you make collations behave differently when being called on behalf of some index rather than some other thing, given no other context? Explicitly passing a pg_icu_library all over the place also sounds non-fun. I'm also suspicious that there are more subtle hazards like pathkeys lurking in the shadows. We go to great effort to recognise matching and non-matching collations by OID alone, which is why my first attempt was "distinct [OIDs]", so that'd keep working. I wondered if DB2's support looked more like "time travel" or "distinct". Based only on a quick glance at their manual[1], it looks a bit like they have "I don't care" collations which are subject to weirdness on upgrade when they change underneath your feet, and then "distinct" collations which have an explicit prefix to pin down the version (indirectly via CLDR version) and route to a specific library (N copies of ICU that ship with it), with a note recommending the latter for indexes and constraints. So I'd guess you'll stay on the old versions forever until you explicitly migrate data to a new collation. [1] https://www.ibm.com/docs/en/db2/11.1?topic=support-locale-names-sql-xquery
pgsql-hackers by date: