Re: insensitive collations - Mailing list pgsql-hackers

From Jim Finnerty
Subject Re: insensitive collations
Date
Msg-id 1616677789675-0.post@n3.nabble.com
Whole thread Raw
In response to Re: insensitive collations  ("Daniel Verite" <daniel@manitou-mail.org>)
Responses Re: insensitive collations  (Jim Finnerty <jfinnert@amazon.com>)
List pgsql-hackers
The two ideas can be combined to create a workaround for accent-sensitive
nondeterministic collations that enables an ordinary btree to be exploited
if available, and also provides the full LIKE logic in either case-sensitive
or case-insensitive collations:

SELECT * FROM locations WHERE location LIKE 'midi-Pyrén%ées';

becomes:

SELECT * FROM locations
WHERE
    location COLLATE "C" ILIKE 'midi-Pyrén%ées'  AND
-- For CI collations only
    location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF';         --
exploitable by ordinary btree

LIKE would also be valid for a case sensitive collation, but then the
collation would be CS_AS, so it would be deterministic and no transform
would be needed.

The expression above produces a good plan, but EXPLAIN complains if the
concatenated expression is not a valid character for the current
client_encoding, which I had set to WIN1252 to display the accented
characters properly on the client:

babel=# SELECT * FROM locations
babel-# WHERE
babel-#     location COLLATE "C" ILIKE 'midi-Pyrén%ées'  AND
-- For CI collations only
babel-#     location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF';
-- exploitable by ordinary btree
    location
-----------------
 Midi-Pyrénées
 midi-Pyrénées
(2 rows)

babel=# EXPLAIN VERBOSE SELECT * FROM locations
babel-# WHERE
babel-#     location COLLATE "C" ILIKE 'midi-Pyrén%ées'  AND
-- For CI collations only
babel-#     location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF';
-- exploitable by ordinary btree
ERROR:  character with byte sequence 0xef 0xbf 0xbf in encoding "UTF8" has
no equivalent in encoding "WIN1252"

reset client_encoding;

babel=# EXPLAIN VERBOSE SELECT * FROM locations
WHERE
    location COLLATE "C" ILIKE 'midi-Pyrén%ées'  AND
-- For CI collations only
    location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF';         --
exploitable by ordinary btree
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Index Only Scan using *location_index* on public.locations
(cost=0.13..8.15 rows=1 width=18)
   Output: location
   Index Cond: ((locations.location >= 'midi-Pyrén'::text) AND
(locations.location <= 'midi-Pyrén�'::text))
   Filter: ((locations.location)::text ~~* 'midi-Pyrén%ées'::text)
(4 rows)




-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: About to add WAL write/fsync statistics to pg_stat_wal view
Next
From: Fujii Masao
Date:
Subject: Re: Is it useful to record whether plans are generic or custom?