Explicit deterministic COLLATE fails with pattern matching operationson column with non-deterministic collation - Mailing list pgsql-bugs

From James Lucas
Subject Explicit deterministic COLLATE fails with pattern matching operationson column with non-deterministic collation
Date
Msg-id CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
Whole thread Raw
Responses Re: Explicit deterministic COLLATE fails with pattern matchingoperations on column with non-deterministic collation  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-bugs
Hi all,

Wanted to call out what seems like a possible bug in non-deterministic
collation handling with pattern matching operators.  Per the
documentation, non-deterministic collations are not supported with
pattern matching operators.  Section 9.7 of the PG12 manual recommends
"The pattern matching operators of all three kinds do not support
nondeterministic collations. If required, apply a different collation
to the expression to work around this limitation."  However, I'm
finding that pattern matching operations fail when a column is
declared with a non-deterministic collation, *even if* a different,
deterministic collation is explicitly applied to the pattern matching
operation.  This doesn't seem to be the expected behavior.

Example.  This is tested on Postgres 12.3, on Centos 8.1.1911 with libicu 60.3.

Create a non-deterministic collation.
create collation mycollation (provider = icu, locale =
'en-US-ks-level2.utf8', deterministic = false);

Create a couple of sample tables:
create table ctest (id numeric, t text);
create table ctestnd (id numeric, t text collate mycollation);

Populate them with some data:
insert into ctest values (1,'aAa');
insert into ctest select generate_series(2,100000),'bbb';
insert into ctestnd select id, t from ctest;
analyze ctest, ctestnd;

Add a few indexes:
create index ctest_idx01 on ctest (t);
create index ctest_idx02 on ctest (t collate "C");
create index ctestnd_idx01 on ctestnd (t);
create index ctestnd_idx02 on ctestnd (t collate "C");

Test on ctest:
explain select * from ctest where t = 'aAa' collate "C";
                                QUERY PLAN
--------------------------------------------------------------------------
 Index Scan using ctest_idx02 on ctest  (cost=0.42..4.44 rows=1 width=10)
   Index Cond: (t = 'aAa'::text COLLATE "C")
COMMENT: Works as expected.

explain select * from ctest where t like 'a%';
                                QUERY PLAN
--------------------------------------------------------------------------
 Index Scan using ctest_idx02 on ctest  (cost=0.42..8.44 rows=1 width=10)
   Index Cond: ((t >= 'a'::text) AND (t < 'b'::text))
   Filter: (t ~~ 'a%'::text)
COMMENT: Actually this is very interesting, because even without an
explicit COLLATE clause, LIKE still uses the "C" collation index.  Not
sure if that's intended behavior either?

explain select * from ctest where t like 'a%' collate "C";
                                QUERY PLAN
--------------------------------------------------------------------------
 Index Scan using ctest_idx02 on ctest  (cost=0.42..8.44 rows=1 width=10)
   Index Cond: ((t >= 'a'::text) AND (t < 'b'::text))
   Filter: (t ~~ 'a%'::text COLLATE "C")
COMMENT: Uses explicit collation and index as expected.


Test on ctestnd:
explain select * from ctestnd where t = 'aAa' collate "C";
                                  QUERY PLAN
------------------------------------------------------------------------------
 Index Scan using ctestnd_idx02 on ctestnd  (cost=0.42..4.44 rows=1 width=10)
   Index Cond: (t = 'aAa'::text COLLATE "C")
COMMENT: Works as expected.

explain select * from ctestnd where t like 'a%';
ERROR:  nondeterministic collations are not supported for LIKE
COMMENT: Fails as expected.

explain select * from ctestnd where t like 'a%' collate "C";
ERROR:  nondeterministic collations are not supported for LIKE
COMMENT: Not expected.  It seems like the explicit COLLATE clause is
ignored in this case.  I've tried different placements for the COLLATE
clause, and none seem to work.


Is this a bug, or have I missed something?

Thanks,
James Lucas



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Re[2]:
Next
From: "David G. Johnston"
Date:
Subject: Re: Explicit deterministic COLLATE fails with pattern matchingoperations on column with non-deterministic collation