Re: Collation version tracking for macOS - Mailing list pgsql-hackers
From | Jeremy Schneider |
---|---|
Subject | Re: Collation version tracking for macOS |
Date | |
Msg-id | EF3786A4-4254-456F-A9B8-C12C4AF3156C@ardentperf.com Whole thread Raw |
In response to | Re: Collation version tracking for macOS (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Collation version tracking for macOS
Re: Collation version tracking for macOS |
List | pgsql-hackers |
> On Jun 8, 2022, at 22:40, Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Jun 8, 2022 at 10:24 PM Jeremy Schneider > <schneider@ardentperf.com> wrote: >> Even if PG supports two versions of ICU, how does someone actually go about removing every dependency on the old versionand replacing it with the new? > > They simply REINDEX, without changing anything. The details are still > fuzzy, but at least that's what I was thinking of. > >> Can it be done without downtime? Can it be done without modifying a running application? > > Clearly the only way that we can ever transition to a new "physical > collation" is by reindexing using a newer ICU version. And clearly > there is going to be a need to fully deprecate any legacy version of > ICU on a long enough timeline. There is just no getting around that. I’m probably just going to end up rehashing the old threads I haven’t read yet… One challenge with this approach is you have things like sort-merge joins that require the same collation across multipleobjects. So I think you’d need to keep all the old indexes around until you have new indexes available for all objectsin a database, and somehow the planner would need to be smart enough to dynamically figure out old vs new versionson a query-by-query basis. May need an atomic database-wide cutover; running a DB with internally mixed collationversions doesn’t seem like a small challenge. It would require enough disk space for two copies of all indexes,and queries would change which indexes they use in a way that wouldn’t be immediately obvious to users or app dev.Suddenly switching to or from a differently-bloated index could result in confusing and sudden performance changes. Also there would still need to be a plan to address all the other non-index objects where collation is used, as has beenmentioned before. And given the current architecture, that final “alter database update default collation” command still seems awful risky,bug-prone and difficult to get correct. At least it seems that way to me. At a minimum, this is a very big project and it seems to me like it may be wise to get more end-to-end fleshing out of theplans before committing incremental pieces in core (which could end up being misguided if the plan doesn’t work as wellas assumed). Definitely doesn’t seem to me like anything that will happen in a year or two. And my opinion is that the problems caused by depending on OS libraries for collation need to be addressed on a shorter timelinethan what’s realistic for inventing a new way for a relational database to offer transparent or online upgrades oflinguistic collation versions. Also I still think folks are overcomplicating this by focusing on linguistic collation as the solution. Like 1% of usersactually need or care about having the latest technically correct local-language-based sorting, at a database level.MySQL did the right thing here by doing what every other RDBMS did, and just making a simple “good-enough” collationhardcoded in the DB, same across all platforms, that never changes. The 1% of users who need true linguistic collation can probably deal with the trade-off of dump-and-load upgrades for theirICU indexes and databases for a few more years. -Jeremy Sent from my TI-83
pgsql-hackers by date: