Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Collation version tracking for macOS
Date
Msg-id CAH2-Wz=Oa5P586u14zkE-PUhyO4XWfi60J4JTkex615pA2eg_w@mail.gmail.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Jeremy Schneider <schneider@ardentperf.com>)
List pgsql-hackers
On Thu, Jun 9, 2022 at 10:54 AM Jeremy Schneider
<schneider@ardentperf.com> wrote:
> I’m probably just going to end up rehashing the old threads I haven’t read yet…
>
> One challenge with this approach is you have things like sort-merge joins that require the same collation across
multipleobjects. So I think you’d need to keep all the old indexes around until you have new indexes available for all
objectsin a database, and somehow the planner would need to be smart enough to dynamically figure out old vs new
versionson a query-by-query basis. 

I don't think that it would be fundamentally difficult to have the
planner deal with collations at the level required to avoid incorrect
query plans.

I'm not suggesting that this is an easy project, or that the end
result would be totally free of caveats, such as the issue with merge
joins. I am only suggesting that something like this seems doable.
There aren't that many distinct high level approaches that could
possibly decouple upgrading Postgres/the OS from reindexing. This is
one.

> And my opinion is that the problems caused by depending on OS libraries for collation need to be addressed on a
shortertimeline than what’s realistic for inventing a new way for a relational database to offer transparent or online
upgradesof linguistic collation versions. 

But what does that really mean? You can use ICU collations as the
default for the entire cluster now. Where do we still fall short? Do
you mean that there is still a question of actively encouraging using
ICU collations?

I don't understand what you're arguing for. Literally everybody agrees
that the current status quo is not good. That much seems settled to
me.

> Also I still think folks are overcomplicating this by focusing on linguistic collation as the solution.

I don't think that's true; I think that everybody understands that
being on the latest linguistic collation is only very rarely a
compelling feature. The whole way that BCP47 tags are so forgiving is
entirely consistent with that view of things.

But what difference does it make? As long as you accept that any
collation *might* need to be updated, or the default ICU version might
change on OS upgrade, then you have to have some strategy for dealing
with the transition. Not being on a very old obsolete version of ICU
will eventually become a "compelling feature" in its own right.

I believe that EDB adopted ICU many years ago, and stuck with one
vendored version for quite a few years. And eventually being on a very
old version of ICU became a real problem.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Soumyadeep Chakraborty
Date:
Subject: Re: ALTER TABLE SET ACCESS METHOD on partitioned tables
Next
From: "Ma, Marcus"
Date:
Subject: Sharing DSA pointer between parallel workers after they've been created