Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Collation version tracking for macOS
Date
Msg-id CA+hUKGKvu_QQyUHk-yOg-LvDajPH-N=JLcB2Abw4+k5mK1KZUg@mail.gmail.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Collation version tracking for macOS
List pgsql-hackers
On Sat, Jun 11, 2022 at 2:29 PM Peter Geoghegan <pg@bowt.ie> wrote:
> The special REINDEX (or whatever) won't work as an atomic
> operation...but that doesn't mean that the system as a whole will have
> a mix of old and new physical collations forever, or even for very
> long. So while everything still has to work correctly, mediocre
> performance with certain kinds of plan shapes might be okay.

Yeah.  And if you don't like the negative effects of a system in
transition, you could also create new otherwise identical indexes, and
then only drop the old ones once that's done, and add enough brains to
keep everything working nicely on the old indexes until enough of the
new indexes exist.  Or something.  I was thinking that could also be
true for the "distinct collations" concept, only with different
problems to solve...

> As you kind of said yourself yesterday, "time travelling collations"
> could naturally have an invariant that worked at the index/constraint
> level (not the collation level): any given index needs to use only one
> underlying ICU version at the same time, no matter what. The invariant
> could perhaps be leveraged in the planner -- perhaps we start out with
> a working assumption that *every* index is on the newer ICU version
> (when at least one index is on the lastest and "now current" version),
> and work backwards by excluding indexes that still have old physical
> collations. Defining the problem as a problem with old
> indexes/constraints only seems like it might make things a lot easier.

Yes, that (posited) invariant was an enabling realisation for the
(unfinished, but IMHO useful to grok) v2 patch.  The disabling
realisation that stopped me from finishing it was that I doubt my
ability to find all the right places to wrap with
"pg_icu_activate_major_version(X)" and
"pg_icu_activate_major_version(-1)", and thus the whole approach.  Do
you know where to put all the ICU version switching regions, and how
to verify that they cover exactly all the right code, but don't leak
into any of the wrong code, and do you know where to store/retrieve X?
 It feels... flimsy to me, but how else could you make collations
behave differently when being called on behalf of some index rather
than some other thing, given no other context?  Explicitly passing a
pg_icu_library all over the place also sounds non-fun.

I'm also suspicious that there are more subtle hazards like pathkeys
lurking in the shadows.  We go to great effort to recognise matching
and non-matching collations by OID alone, which is why my first
attempt was "distinct [OIDs]", so that'd keep working.

I wondered if DB2's support looked more like "time travel" or
"distinct".  Based only on a quick glance at their manual[1], it looks
a bit like they have "I don't care" collations which are subject to
weirdness on upgrade when they change underneath your feet, and then
"distinct" collations which have an explicit prefix to pin down the
version (indirectly via CLDR version) and route to a specific library
(N copies of ICU that ship with it), with a note recommending the
latter for indexes and constraints.  So I'd guess you'll stay on the
old versions forever until you explicitly migrate data to a new
collation.

[1] https://www.ibm.com/docs/en/db2/11.1?topic=support-locale-names-sql-xquery



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Collation version tracking for macOS
Next
From: Thomas Munro
Date:
Subject: Re: Collation version tracking for macOS