Re: BUG #15892: URGENT: Using an ICU collation in a primary key column breaks ILIKE query - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #15892: URGENT: Using an ICU collation in a primary key column breaks ILIKE query
Date
Msg-id 8498.1565559169@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #15892: URGENT: Using an ICU collation in a primary keycolumn breaks ILIKE query  (Arthur Zakirov <a.zakirov@postgrespro.ru>)
Responses Re: BUG #15892: URGENT: Using an ICU collation in a primary keycolumn breaks ILIKE query  (Arthur Zakirov <a.zakirov@postgrespro.ru>)
List pgsql-bugs
Arthur Zakirov <a.zakirov@postgrespro.ru> writes:
> On 04.07.2019 19:50, Daniel Verite wrote:
>> Yes. It is because of the index that the code checks if the ILIKE
>> can be evaluated with an index lookup. Otherwise it doesn't.
>> If you feel like recompiling with a temporary fix against v11.4 to do
>> your own tests, please try the attached patch.

> Also this patch works. I thinks we need to use some ICU function to 
> check whether a char is a letter.

This topic seems to have slipped through the cracks, which is too bad
because we surely could have gotten it fixed in time for last week's
releases.  Oh well.

I think Daniel's patch leaves more on the table than it needs to,
and I don't like Arthur's patch because it's assuming more than it
should about whether it's okay to pass a "char" to u_isalpha().
I propose that we do things as attached, instead.

(I also added a regression test case.)

            regards, tom lane

diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 26e0634..c8fec78 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1437,8 +1437,9 @@ regex_selectivity(const char *patt, int pattlen, bool case_insensitive,
  * Check whether char is a letter (and, hence, subject to case-folding)
  *
  * In multibyte character sets or with ICU, we can't use isalpha, and it does
- * not seem worth trying to convert to wchar_t to use iswalpha.  Instead, just
- * assume any multibyte char is potentially case-varying.
+ * not seem worth trying to convert to wchar_t to use iswalpha or u_isalpha.
+ * Instead, just assume any non-ASCII char is potentially case-varying, and
+ * hard-wire knowledge of which ASCII chars are letters.
  */
 static int
 pattern_char_isalpha(char c, bool is_multibyte,
@@ -1449,7 +1450,8 @@ pattern_char_isalpha(char c, bool is_multibyte,
     else if (is_multibyte && IS_HIGHBIT_SET(c))
         return true;
     else if (locale && locale->provider == COLLPROVIDER_ICU)
-        return IS_HIGHBIT_SET(c) ? true : false;
+        return IS_HIGHBIT_SET(c) ||
+            (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
 #ifdef HAVE_LOCALE_T
     else if (locale && locale->provider == COLLPROVIDER_LIBC)
         return isalpha_l((unsigned char) c, locale->info.lt);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 51262e0..50d1e6e 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -983,6 +983,38 @@ SELECT relname, pg_get_indexdef(oid) FROM pg_class WHERE relname LIKE 'collate_t
  collate_test1_idx4 | CREATE INDEX collate_test1_idx4 ON collate_tests.collate_test1 USING btree (((b || 'foo'::text))
COLLATE"POSIX") 
 (4 rows)

+set enable_seqscan = off;
+explain (costs off)
+select * from collate_test1 where b ilike 'abc';
+          QUERY PLAN
+-------------------------------
+ Seq Scan on collate_test1
+   Filter: (b ~~* 'abc'::text)
+(2 rows)
+
+select * from collate_test1 where b ilike 'abc';
+ a |  b
+---+-----
+ 1 | abc
+ 4 | ABC
+(2 rows)
+
+explain (costs off)
+select * from collate_test1 where b ilike 'ABC';
+          QUERY PLAN
+-------------------------------
+ Seq Scan on collate_test1
+   Filter: (b ~~* 'ABC'::text)
+(2 rows)
+
+select * from collate_test1 where b ilike 'ABC';
+ a |  b
+---+-----
+ 1 | abc
+ 4 | ABC
+(2 rows)
+
+reset enable_seqscan;
 -- schema manipulation commands
 CREATE ROLE regress_test_role;
 CREATE SCHEMA test_schema;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 46999fb..5ed6360 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -341,6 +341,15 @@ CREATE INDEX collate_test1_idx6 ON collate_test1 ((a COLLATE "C")); -- fail

 SELECT relname, pg_get_indexdef(oid) FROM pg_class WHERE relname LIKE 'collate_test%_idx%' ORDER BY 1;

+set enable_seqscan = off;
+explain (costs off)
+select * from collate_test1 where b ilike 'abc';
+select * from collate_test1 where b ilike 'abc';
+explain (costs off)
+select * from collate_test1 where b ilike 'ABC';
+select * from collate_test1 where b ilike 'ABC';
+reset enable_seqscan;
+

 -- schema manipulation commands


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #15947: Worse plan is chosen after giving the planner more freedom (partitionwise join)
Next
From: PG Bug reporting form
Date:
Subject: BUG #15948: TRGM gin index is not be taken into account when using like all (array)