Re: Unicode FFFF Special Codepoint should always collate high. - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: Unicode FFFF Special Codepoint should always collate high.
Date
Msg-id CA+hUKGJF0C0_rch1emO7id0QxDNa2s0pPtQCVjuTPM3mqLs+hg@mail.gmail.com
Whole thread Raw
In response to Unicode FFFF Special Codepoint should always collate high.  (Telford Tendys <psql@lnx-bsp.net>)
Responses Re: Unicode FFFF Special Codepoint should always collate high.  (Telford Tendys <psql@lnx-bsp.net>)
List pgsql-bugs
On Wed, Jun 23, 2021 at 3:00 PM Telford Tendys <psql@lnx-bsp.net> wrote:
> Thank you for taking a look at it, you seem to have confirmed that
> this is coming from the system itself. Yes, my purpose is to do
> prefix searching on strings by specifying a range and taking advantage
> of a B-Tree index, exactly as described in the quote above.

Just in case you didn't know, PostgreSQL knows how to convert a prefix
search for 'aar%' into a range search with a related trick in some
limited circumstances:

tmunro=> create table t (name text);
CREATE TABLE
tmunro=> insert into t values ('aardvark'), ('buffalo'), ('cat');
INSERT 0 3
tmunro=> create index on t(name text_pattern_ops);
CREATE INDEX
tmunro=> analyze t;
ANALYZE
tmunro=> explain select * from t where name like 'aar%';
                               QUERY PLAN
-------------------------------------------------------------------------
 Index Only Scan using t_name_idx on t  (cost=0.13..8.15 rows=1 width=7)
   Index Cond: ((name ~>=~ 'aar'::text) AND (name ~<~ 'aas'::text))
   Filter: (name ~~ 'aar%'::text)
(3 rows)

>     https://bugzilla.redhat.com/show_bug.cgi?id=1975045

Seems a little light on references and justifications for the
expectiation.  ICU's results could be useful in the discussion.  It's
interesting that more recent Unicode versions removed the prohibition
on using these code points at all (some earlier version said that if
your string contained them, they weren't Unicode, apparently,
according to my quick read of
https://en.wikipedia.org/wiki/Specials_(Unicode_block) but I didn't
have time to look into any source documents).



pgsql-bugs by date:

Previous
From: Telford Tendys
Date:
Subject: Re: Unicode FFFF Special Codepoint should always collate high.
Next
From: Vladimir Shvartsgor
Date:
Subject: Example in "42.8. Transaction Management" doesn't work for PostgreSQL v 12.7