Re: ICU for global collation - Mailing list pgsql-hackers

From Daniel Verite
Subject Re: ICU for global collation
Date
Msg-id 809c0fba-b0e6-44bf-b110-0886cb462aab@manitou-mail.org
Whole thread Raw
In response to Re: ICU for global collation  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses Re: ICU for global collation
Re: ICU for global collation
List pgsql-hackers
    Peter Eisentraut wrote:

> Unlike in the previous patch, where the ICU
> collation name was written in datcollate, there is now a third column
> (daticucoll), so we can store all three values.

I think some users would want their db-wide ICU collation to be
case/accent-insensitive. Postgres users are trained to expect
case-sensitive comparisons, but some apps initially made for
e.g. MySQL or MS-SQL that use such collations natively would be easier
to port to Postgres.
IIRC, that was the context for some questions where people were
enquiring about db-wide ICU collations.

With the current patch, it's not possible, AFAICS, because the user
can't tell that the collation is non-deterministic. Presumably this
would require another option to CREATE DATABASE and another
column to store that bit of information.

The "daticucol" column also suggests that we don't expect to add
other collation providers in the future. Maybe a pair of columns like
(datcollprovider, datcolllocale) would be more future-proof,
or a (datcollprovider, datcolllocale, datcollisdeterministic)
triplet if non-deterministic collations are allowed.

Also, pg_collation has "collversion" to detect a mismatch between
the ICU runtime and existing indexes. I don't see that field
for the db-wide ICU collation, so maybe we currently miss the capability
to detect the mismatch for the db-wide collation?

The lack of these fields overall suggest the idea that when CREATE
DATABASE is called with a global ICU collation, what if it somehow
inserted the collation into pg_collation in the new database?
Then pg_database would just store the collation oid and no other
collation-related field would need to be added into it, now
or in the future.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
Next
From: Amit Kapila
Date:
Subject: Re: Skipping logical replication transactions on subscriber side