Support text position search functions with nondeterministic collations
This allows using text position search functions with nondeterministic
collations. These functions are
- position, strpos
- replace
- split_part
- string_to_array
- string_to_table
which all use common internal infrastructure.
There was previously no internal implementation of this, so it was met
with a not-supported error. This adds the internal implementation and
removes the error.
Unlike with deterministic collations, the search cannot use any
byte-by-byte optimized techniques but has to go substring by
substring. We also need to consider that the found match could have a
different length than the needle and that there could be substrings of
different length matching at a position. In most cases, we need to
find the longest such substring (greedy semantics), but this can be
configured by each caller.
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://www.postgresql.org/message-id/flat/582b2613-0900-48ca-8b0d-340c06f4d400@eisentraut.org
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/329304c9012b2ac6d906afeb18062f9080dceef9
Modified Files
--------------
src/backend/utils/adt/varlena.c | 104 ++++++++++++++---
src/test/regress/expected/collate.icu.utf8.out | 154 ++++++++++++++++++++-----
src/test/regress/sql/collate.icu.utf8.sql | 36 +++++-
3 files changed, 246 insertions(+), 48 deletions(-)