Re: Allow ILIKE forward matching to use btree index - Mailing list pgsql-hackers
From | Yugo Nagata |
---|---|
Subject | Re: Allow ILIKE forward matching to use btree index |
Date | |
Msg-id | 20241224160442.5852d16fcc64cdfc8238c325@sraoss.co.jp Whole thread Raw |
List | pgsql-hackers |
On Fri, 20 Dec 2024 03:22:26 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > Hi, > > Currently, btree indexes cannot used for ILIKE (~~*) operator if the pattern > has case-varying characters although LIKE (~~) expression can be converted > to indexable clauses by the planner support function (if the collation > is "C" or operator class 'text_pattern_ops' is used). > > For example, "t ~~ '123foo%'" is converted to "(t >= '123foo' AND t < '123fop')" > and index scan can be used for this condition. On the other hand, "t ~~* '123foo'" > cannot be converted and sequential scan is used. > > Even in this case, we can use a bitmap index scan for the condition > "(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')" followed by > recheck by the original condition "t ~~* '123foo'", and this could be faster > than seqscan. > > However, even if the support function would return OR clauses, the current > planner implementation cannot not build bitmap scan paths using them. > > > The attached patches allow to ILIKE (~~*) forward matching to use btree index. > > The patch 0001 enables get_index_clause_from_support to receive OR clauses > from support functions and use them to build bitmap index scan later. OR clauses > returned by support functions are collected in the new argument 'orclauses" of > match_restriction_clauses_to_index(), and they are passed to > generate_bitmap_or_paths() later to build bitmap scan paths. > > The patch 0002 modifies the support function to return OR clauses as an example > above when ILIKE's pattern has case-varying characters in forward matching. The > OR clauses contains two index conditions for the upper and the lower letter of > the first case-varying character in the pattern respectively. Although the > subsequent characters are not considered in the index scans, it could enough be > faster then sequent scan. > > Here is an example. > > 1. Create a table with random text records > > =# CREATE TABLE tbl (t text); > =# INSERT INTO tbl SELECT CASE WHEN i%2=1 THEN upper(x) ELSE x END > FROM (SELECT i, md5(i::text) x FROM generate_series(1,5000000) i); > > 2. Create an index > =# CREATE INDEX ON tbl (t text_pattern_ops); > > 3. Before applying patch: Seq Scan was selected. It takes about 4 sec. > > =# EXPLIN ANALYZE SELECT * FROM tbl WHERE t ILIKE '12ab%'; I am sorry, the example in my previous main was wrong. I showed the plan with enable_index_scan = off. Indead, if the pattern starts with case-insensitive characters like '12', an index scan can be used even with ILIKE. postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..69129.61 rows=500 width=33) (actual time=36.317..4034.770 rows=1188 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=99 read=41939 -> Parallel Seq Scan on tbl (cost=0.00..68079.61 rows=208 width=33) (actual time=19.908..4023.668 rows=396 loops=3) Filter: (t ~~* 'abc%'::text) Rows Removed by Filter: 1666271 Buffers: shared hit=99 read=41939 Planning Time: 0.726 ms Execution Time: 4035.101 ms (10 rows) 4. After applying patch: Bitmap Index Scan was selected. postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on tbl (cost=12563.66..58314.53 rows=500 width=33) (actual time=156.485..1266.789 rows=1188 loops=1) Recheck Cond: (((t ~>=~ 'A'::text) AND (t ~<~ 'B'::text) AND (t ~~* 'abc%'::text)) OR ((t ~>=~ 'a'::text) AND (t ~<~ 'b'::text)AND (t ~~* 'abc%'::text))) Filter: (t ~~* 'abc%'::text) Rows Removed by Filter: 311473 Heap Blocks: exact=42010 Buffers: shared hit=1 read=44600 -> BitmapOr (cost=12563.66..12563.66 rows=297029 width=0) (actual time=136.979..136.980 rows=0 loops=1) Buffers: shared hit=1 read=2590 -> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71 rows=148515 width=0) (actual time=80.548..80.549 rows=158375loops=1) Index Cond: ((t ~>=~ 'A'::text) AND (t ~<~ 'B'::text)) Buffers: shared read=1272 -> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71 rows=148515 width=0) (actual time=56.427..56.427 rows=157042loops=1) Index Cond: ((t ~>=~ 'a'::text) AND (t ~<~ 'b'::text)) Buffers: shared hit=1 read=1318 Planning Time: 0.592 ms Execution Time: 1267.162 ms (16 rows) > What do you think about it? > > I think another application of OR-clause returning support function might be > allowing to use an index for NOT LIKE (!~~) expression because, for example, > "t !~~ '123foo%'" could be converted to "(t < '123foo' OR t >= '123fooz')". > (The second condition is a bit loose but this would be safe and not problem > since tuples are filtered by the original condition after the index scan.) > However, it would not very useful unless the distribution is much skew so that > NOT LIKE expression's selectivity is enough small. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
pgsql-hackers by date: