Re: collate not support Unicode Variation Selector - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: collate not support Unicode Variation Selector
Date
Msg-id 20220805.155032.1548634303804827517.horikyota.ntt@gmail.com
Whole thread Raw
In response to RE: collate not support Unicode Variation Selector  (荒井元成 <n2029@ndensan.co.jp>)
List pgsql-hackers
At Thu, 4 Aug 2022 19:01:33 +0900, 荒井元成 <n2029@ndensan.co.jp> wrote in 
> Thank you for your reply.
> 
> SQLServer supports Unicode Variation Selector, so I would like PostgreSQL to
> support them as well.

I studied the code a bit further, then found that simple comparison
can ignore selectors by using nondeterministic collation.

CREATE COLLATION col1 (provider=icu, locale='ja', deterministic=false);
SELECT (U&'\+003436' || U&'\+0E0101' || U&'\+00304D' collate col1) = U&'\+003436' || U&'\+00304D';
 ?column? 
----------
 t

However LIKE dislikes this.

> ERROR:  nondeterministic collations are not supported for LIKE

Deterministic collations assume text equality means bytewise
equality. So, the "problem" behavior is correct in a sense.  In that
sense, those functions that do not support nondeterministic collations
can be implemented without considering ICU, which leads to the
"problem" behavior.  ICU has regular expression function so LIKE might
be ableto be implemented using this.  If it is done, and if a
non-deterministic IVS-sensitive collation is available (I didin't find
how to get one..), LIKE would work as you expect.

But..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Support logical replication of DDLs
Next
From: Dilip Kumar
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints