Home > mailing lists

Re: Support LIKE with nondeterministic collations - Mailing list pgsql-hackers

From	Daniel Verite
Subject	Re: Support LIKE with nondeterministic collations
Date	May 3, 2024 14:58:55
Msg-id	9c81e66f-0e06-4756-b9a2-d7b7d46379c1@manitou-mail.org Whole thread Raw
In response to	Re: Support LIKE with nondeterministic collations (Peter Eisentraut <peter@eisentraut.org>)
Responses	Re: Support LIKE with nondeterministic collations Re: Support LIKE with nondeterministic collations
List	pgsql-hackers

Tree view

Peter Eisentraut wrote:

> Yes, certainly, and there is also no indexing support (other than for
> exact matches).

The ICU docs have this note about prefix matching:

https://unicode-org.github.io/icu/userguide/collation/architecture.html#generating-bounds-for-a-sort-key-prefix-matching

   * Generating bounds for a sort key (prefix matching)

   Having sort keys for strings allows for easy creation of bounds -
   sort keys that are guaranteed to be smaller or larger than any sort
   key from a give range. For example, if bounds are produced for a
   sortkey of string “smith”, strings between upper and lower bounds
   with one level would include “Smith”, “SMITH”, “sMiTh”. Two kinds
   of upper bounds can be generated - the first one will match only
   strings of equal length, while the second one will match all the
   strings with the same initial prefix.

   CLDR 1.9/ICU 4.6 and later map U+FFFF to a collation element with
   the maximum primary weight, so that for example the string
   “smith\uFFFF” can be used as the upper bound rather than modifying
   the sort key for “smith”.

In other words it says that

  col LIKE 'smith%' collate "nd"

is equivalent to:

  col >= 'smith' collate "nd" AND col < U&'smith\ffff' collate "nd"

which could be obtained from an index scan, assuming a btree
index on "col" collate "nd".

U+FFFF is a valid code point but a "non-character" [1] so it's
not supposed to be present in normal strings.

[1] https://www.unicode.org/glossary/#noncharacter

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

pgsql-hackers by date:

From: David Rowley
Date: 03 May 2024, 14:36:35
Subject: Re: Typos in the code and README

From: Tom Lane
Date: 03 May 2024, 15:09:32
Subject: Re: Tarball builds in the new world order

Re: Support LIKE with nondeterministic collations - Mailing list pgsql-hackers

Previous

Next