Re: insensitive collations - Mailing list pgsql-hackers

From Daniel Verite
Subject Re: insensitive collations
Date
Msg-id 693ad06d-db9a-4e59-8131-f823483c5893@manitou-mail.org
Whole thread Raw
In response to Re: insensitive collations  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: insensitive collations  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Re: insensitive collations  (Jim Finnerty <jfinnert@amazon.com>)
List pgsql-hackers
    Peter Eisentraut wrote:

> Another patch.

+    <literal>ks</literal> key), in order for such such collations to act in
a

s/such such/such/

+   <para>
+    The pattern matching operators of all three kinds do not support
+    nondeterministic collations.  If required, apply a different collation
to
+    the expression to work around this limitation.
+   </para>

It's an important point of comparison between CI collations and
contrib/citext, since the latter diverts a bunch of functions/operators
to make them do case-insensitive pattern matching.
The doc for citext explains the rationale for using it versus text,
maybe it would need now to be expanded a bit with pros/cons of
choosing citext versus non-deterministic collations.

The current patch doesn't alter a few string functions that could
potentially implement collation-aware string search, such as
replace(), strpos(), starts_with().
ISTM that we should not let these functions ignore the collation: they
ought to error out until we get their implementation to use the ICU
collation-aware string search.
FWIW I've been experimenting with usearch_openFromCollator() and
other usearch_* functions, and it looks doable to implement at least the
3 above functions based on that, even though the UTF16-ness of the API
does not favor us.

ICU also provides regexp matching, but not collation-aware, since
character-based patterns don't play well with the concept of collation.
About a potential collation-aware LIKE, it looks hard to implement,
since the algorithm currently used in like_match.c seems purely
character-based. AFAICS there's no way to plug calls to usearch_*
functions into it, it would need a separate redesign from scratch.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: WIP: Avoid creation of the free space map for small tables
Next
From: leif@lako.no
Date:
Subject: Fwd: Re: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write