Home > mailing lists

Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From	Jeremy Schneider
Subject	Re: Collation version tracking for macOS
Date	June 9, 2022 17:54:01
Msg-id	EF3786A4-4254-456F-A9B8-C12C4AF3156C@ardentperf.com Whole thread Raw
In response to	Re: Collation version tracking for macOS (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: Collation version tracking for macOS Re: Collation version tracking for macOS
List	pgsql-hackers

Tree view

> On Jun 8, 2022, at 22:40, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Jun 8, 2022 at 10:24 PM Jeremy Schneider
> <schneider@ardentperf.com> wrote:
>> Even if PG supports two versions of ICU, how does someone actually go about removing every dependency on the old
versionand replacing it with the new? 
>
> They simply REINDEX, without changing anything. The details are still
> fuzzy, but at least that's what I was thinking of.
>
>> Can it be done without downtime? Can it be done without modifying a running application?
>
> Clearly the only way that we can ever transition to a new "physical
> collation" is by reindexing using a newer ICU version. And clearly
> there is going to be a need to fully deprecate any legacy version of
> ICU on a long enough timeline. There is just no getting around that.

I’m probably just going to end up rehashing the old threads I haven’t read yet…

One challenge with this approach is you have things like sort-merge joins that require the same collation across
multipleobjects. So I think you’d need to keep all the old indexes around until you have new indexes available for all
objectsin a database, and somehow the planner would need to be smart enough to dynamically figure out old vs new
versionson a query-by-query basis. May need an atomic database-wide cutover; running a DB with internally mixed
collationversions doesn’t seem like a small challenge. It would require enough disk space for two copies of all
indexes,and queries would change which indexes they use in a way that wouldn’t be immediately obvious to users or app
dev.Suddenly switching to or from a differently-bloated index could result in confusing and sudden performance changes. 

Also there would still need to be a plan to address all the other non-index objects where collation is used, as has
beenmentioned before. 

And given the current architecture, that final “alter database update default collation” command still seems awful
risky,bug-prone and difficult to get correct. At least it seems that way to me. 

At a minimum, this is a very big project and it seems to me like it may be wise to get more end-to-end fleshing out of
theplans before committing incremental pieces in core (which could end up being misguided if the plan doesn’t work as
wellas assumed). Definitely doesn’t seem to me like anything that will happen in a year or two. 

And my opinion is that the problems caused by depending on OS libraries for collation need to be addressed on a shorter
timelinethan what’s realistic for inventing a new way for a relational database to offer transparent or online upgrades
oflinguistic collation versions. 

Also I still think folks are overcomplicating this by focusing on linguistic collation as the solution. Like 1% of
usersactually need or care about having the latest technically correct local-language-based sorting, at a database
level.MySQL did the right thing here by doing what every other RDBMS did, and just making a simple “good-enough”
collationhardcoded in the DB, same across all platforms, that never changes. 

The 1% of users who need true linguistic collation can probably deal with the trade-off of dump-and-load upgrades for
theirICU indexes and databases for a few more years. 

-Jeremy

Sent from my TI-83

pgsql-hackers by date:

From: Peter Geoghegan
Date: 09 June 2022, 17:39:17
Subject: Re: BTMaxItemSize seems to be subtly incorrect

From: Robert Haas
Date: 09 June 2022, 18:20:01
Subject: Re: BTMaxItemSize seems to be subtly incorrect

Re: Collation version tracking for macOS - Mailing list pgsql-hackers

Previous

Next