Re: insensitive collations - Mailing list pgsql-hackers

From Jim Finnerty
Subject Re: insensitive collations
Date
Msg-id 1616618498263-0.post@n3.nabble.com
Whole thread Raw
In response to Re: insensitive collations  (Jim Finnerty <jfinnert@amazon.com>)
Responses Re: insensitive collations  ("Daniel Verite" <daniel@manitou-mail.org>)
Re: insensitive collations  ("Daniel Verite" <daniel@manitou-mail.org>)
List pgsql-hackers
For a UTF8 encoded, case-insensitive (nondeterministic), accent-sensitive ICU
collation, a LIKE predicate can be used with a small transformation of the
predicate, and the pattern can contain multi-byte characters:

from:

SELECT * FROM locations WHERE location LIKE 'midi-Pyrené%';
-- ERROR:  nondeterministic collations are not supported for LIKE

to:

SELECT * FROM locations WHERE lower(location) COLLATE "C" LIKE
lower('midi-Pyrené%');

and if there is an index defined as follows:

CREATE INDEX location_LOWER_C_index ON
    locations (LOWER(location) COLLATE "C");

then the LIKE predicate above performs the desired CI_AS evaluation and also
exploits the index:

EXPLAIN VERBOSE SELECT * FROM locations WHERE LOWER(location) COLLATE "C"
LIKE LOWER('midi-Pyrené%');

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using location_LOWER_C_index on public.locations
(cost=0.13..8.16 rows=1 width=18)
   Output: location
   Index Cond: (((lower((locations.location)::text))::text >=
'midi-pyren?'::text) AND ((lower((locations.location)::text))::text <
'midi-pyren?'::text))
   Filter: ((lower((locations.location)::text))::text ~~
'midi-pyren?%'::text)
(4 rows)


It turns out that CI_AS represents the vast majority (more than 99.9%) of
nondeterministic collations that we are seeing in babelfish, because
SQL_Latin1_General_CP1_CI_AS is the default collation in SQL Server.

Currently nondeterministic collations are disabled at the database level.
The cited reason was because of the lack of LIKE support and because certain
catalog views use LIKE.  That may still need to be a limitation if those
LIKE predicates currently have an index exploitation unless we can create a
LOWER(col) COLLATE "C" index on the catalog.  Which catalog views were these
that had the problem?




-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: truncating timestamps on arbitrary intervals
Next
From: Robert Haas
Date:
Subject: Re: pg_amcheck contrib application