a prefix searching - Mailing list pgsql-general

From Олег Самойлов
Subject a prefix searching
Date
Msg-id F5AA20A6-F2BC-4A60-A58A-0EF5220FD077@ya.ru
Whole thread Raw
List pgsql-general
I found an interesting information, which I want to share. When I analysed the ChangeLog I found:

Add prefix-match operator text ^@ text, which is supported by SP-GiST (Ildus Kurbangaliev)
This is similar to using var LIKE 'word%' with a btree index, but it is more efficient.

It was interesting for me, because and searching for prefix is enough often happened and I knew nothing about such
operator..This operator is based on internal function starts_with() and supported by SP-GiST. 

The information in the official documentation is really poor.

https://www.postgresql.org/docs/12/functions-string.html
In the section about string operators said nothing about "^@" , only about internal function starts_with(), which can
notbe accelerated by index. 

https://www.postgresql.org/docs/12/functions-matching.html
But in the section about pattern matching in subsection about "LIKE" mentioned both "^@" operator and internal
starts_width().This is surprise, because "^@" has nothing with LIKE or pattern matching. Also nothing said that it is
usefulonly with SP-GiST index. 

So I decided investigate by myself. I used data from pg_proc for my tests.

First, check btree index:

=> create table test_btree (oid oid primary key,proname text not null);
CREATE TABLE
=> insert into test_btree select oid,proname from pg_proc;
INSERT 0 2960
=> create index on test_btree (proname);
CREATE INDEX
=> analyze test_btree;
ANALYZE
=> explain select * from test_btree where proname ^@ 'bool';
                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on test_btree  (cost=0.00..55.00 rows=27 width=17)
   Filter: (proname ^@ 'bool'::text)
(2 rows)
=> explain select * from test_btree where starts_with(proname,'bool');
                          QUERY PLAN
--------------------------------------------------------------
 Seq Scan on test_btree  (cost=0.00..55.00 rows=987 width=17)
   Filter: starts_with(proname, 'bool'::text)
(2 rows)
=> explain select * from test_btree where proname like 'bool%';
                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on test_btree  (cost=0.00..55.00 rows=27 width=17)
   Filter: (proname ~~ 'bool%'::text)
(2 rows)

=> explain select * from test_btree where proname similar to 'bool%';
                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on test_btree  (cost=0.00..55.00 rows=27 width=17)
   Filter: (proname ~ '^(?:bool.*)$'::text)
(2 rows)
=> explain select * from test_btree where proname ~ '^bool.*';
                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on test_btree  (cost=0.00..55.00 rows=27 width=17)
   Filter: (proname ~ '^bool.*'::text)
(2 rows)

This is surprise, but any prefix searching don't work with common btree index based on the local collation. But why
not?The prefix searching is expected to be used with local collation, as all other text searching. 

Next is a lifehack to use "C" collation. I read about this in the very old official PostgreSQL documentation.

=> create table test_btree_c (oid oid primary key,proname text not null);
CREATE TABLE
=> insert into test_btree_c select oid,proname from pg_proc;
INSERT 0 2960
=> create index on test_btree_c (proname collate "C");
CREATE INDEX
=> analyze test_btree_c;
ANALYZE
=> explain select * from test_btree_c where proname collate "C" ^@ 'bool';
                          QUERY PLAN
---------------------------------------------------------------
 Seq Scan on test_btree_c  (cost=0.00..55.00 rows=27 width=17)
   Filter: ((proname)::text ^@ 'bool'::text)
(2 rows)
=> explain select * from test_btree_c where starts_with(proname collate "C",'bool');
                           QUERY PLAN
----------------------------------------------------------------
 Seq Scan on test_btree_c  (cost=0.00..55.00 rows=987 width=17)
   Filter: starts_with((proname)::text, 'bool'::text)
(2 rows)
=> explain select * from test_btree_c where proname collate "C" like 'bool%';
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Index Scan using test_btree_c_proname_idx on test_btree_c  (cost=0.28..3.26 rows=27 width=17)
   Index Cond: (((proname)::text >= 'bool'::text) AND ((proname)::text < 'boom'::text))
   Filter: ((proname)::text ~~ 'bool%'::text)
(3 rows)
=> explain select * from test_btree_c where proname collate "C" similar to 'bool%';
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Index Scan using test_btree_c_proname_idx on test_btree_c  (cost=0.28..3.26 rows=27 width=17)
   Index Cond: (((proname)::text >= 'bool'::text) AND ((proname)::text < 'boom'::text))
   Filter: ((proname)::text ~ '^(?:bool.*)$'::text)
(3 rows)
=> explain select * from test_btree_c where proname collate "C" ~ '^bool.*';
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Index Scan using test_btree_c_proname_idx on test_btree_c  (cost=0.28..3.26 rows=27 width=17)
   Index Cond: (((proname)::text >= 'bool'::text) AND ((proname)::text < 'boom'::text))
   Filter: ((proname)::text ~ '^bool.*'::text)
(3 rows)

As you can see the prefix searching based on pattern matching use such btree index, while the special operator "^@" is
not.Also, why not?  

The modern the official PostgreSQL documentation advice to use text_pattern_ops for such case.
https://www.postgresql.org/docs/12/indexes-opclass.html
This information is not easily can be founded, IMHO worth to add link to there from here
https://www.postgresql.org/docs/12/functions-matching.html.

=> create table test_btree_ops (oid oid primary key,proname text not null);
CREATE TABLE
=> insert into test_btree_ops select oid,proname from pg_proc;
INSERT 0 2960
=> create index on test_btree_ops (proname text_pattern_ops);
CREATE INDEX
=>  analyze test_btree_ops;
ANALYZE
=> explain select * from test_btree_ops where proname ^@ 'bool';
                           QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on test_btree_ops  (cost=0.00..55.00 rows=27 width=17)
   Filter: (proname ^@ 'bool'::text)
(2 rows)
=> explain select * from test_btree_ops where starts_with(proname,'bool');
                            QUERY PLAN
------------------------------------------------------------------
 Seq Scan on test_btree_ops  (cost=0.00..55.00 rows=987 width=17)
   Filter: starts_with(proname, 'bool'::text)
(2 rows)
=>  explain select * from test_btree_ops where proname like 'bool%';
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Index Scan using test_btree_ops_proname_idx on test_btree_ops  (cost=0.28..3.33 rows=27 width=17)
   Index Cond: ((proname ~>=~ 'bool'::text) AND (proname ~<~ 'boom'::text))
   Filter: (proname ~~ 'bool%'::text)
(3 rows)
=> explain select * from test_btree_ops where proname similar to 'bool%';
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Index Scan using test_btree_ops_proname_idx on test_btree_ops  (cost=0.28..3.33 rows=27 width=17)
   Index Cond: ((proname ~>=~ 'bool'::text) AND (proname ~<~ 'boom'::text))
   Filter: (proname ~ '^(?:bool.*)$'::text)
(3 rows)
=> explain select * from test_btree_ops where proname ~ '^bool.*';
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Index Scan using test_btree_ops_proname_idx on test_btree_ops  (cost=0.28..3.33 rows=27 width=17)
   Index Cond: ((proname ~>=~ 'bool'::text) AND (proname ~<~ 'boom'::text))
   Filter: (proname ~ '^bool.*'::text)
(3 rows)

The same as with "collation C", but much easy to use. The question is why special "^@" operator don't work on such
index?Here someone can apply a hand. :) 

Now SP-GiST:

=> create table test_sp_gist (oid oid primary key,proname text not null);
CREATE TABLE
=> insert into test_sp_gist select oid,proname from pg_proc;
INSERT 0 2960
=> create index on test_sp_gist using spgist(proname);
CREATE INDEX
=> analyze test_sp_gist;
ANALYZE
=> explain select * from test_sp_gist where proname ^@ 'bool';
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Index Scan using test_sp_gist_proname_idx on test_sp_gist  (cost=0.14..17.62 rows=27 width=17)
   Index Cond: (proname ^@ 'bool'::text)
(2 rows)
=> explain select * from test_sp_gist where starts_with(proname,'bool');
                           QUERY PLAN
----------------------------------------------------------------
 Seq Scan on test_sp_gist  (cost=0.00..55.00 rows=987 width=17)
   Filter: starts_with(proname, 'bool'::text)
(2 rows)
=> explain select * from test_sp_gist where proname like 'bool%';
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Index Scan using test_sp_gist_proname_idx on test_sp_gist  (cost=0.14..3.19 rows=27 width=17)
   Index Cond: ((proname ~>=~ 'bool'::text) AND (proname ~<~ 'boom'::text))
   Filter: (proname ~~ 'bool%'::text)
(3 rows)
=> explain select * from test_sp_gist where proname similar to 'bool%';
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Index Scan using test_sp_gist_proname_idx on test_sp_gist  (cost=0.14..3.19 rows=27 width=17)
   Index Cond: ((proname ~>=~ 'bool'::text) AND (proname ~<~ 'boom'::text))
   Filter: (proname ~ '^(?:bool.*)$'::text)
(3 rows)
=> explain select * from test_sp_gist where proname ~ '^bool.*';
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Index Scan using test_sp_gist_proname_idx on test_sp_gist  (cost=0.14..3.19 rows=27 width=17)
   Index Cond: ((proname ~>=~ 'bool'::text) AND (proname ~<~ 'boom'::text))
   Filter: (proname ~ '^bool.*'::text)
(3 rows)

Now, as you can see, index is used both "^@" operator and pattern searching operators. But function is obviously not.
Whatis strange, the function is documented, while operator is not. %) 
https://www.postgresql.org/docs/12/functions-string.html

Also, as you can see the cost of "^@" is extimateed as 17.62, while for pattern searching like "LIKE" is 3.19. Is it
trueand the special operator "^@" is five times more slow, then pattern searching? It is wrong. 


=> explain (analyze,buffers) select * from test_btree_c where proname collate "C" like 'bool%';
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_btree_c_proname_idx on test_btree_c  (cost=0.28..3.26 rows=27 width=17) (actual
time=0.016..0.026rows=20 loops=1) 
   Index Cond: (((proname)::text >= 'bool'::text) AND ((proname)::text < 'boom'::text))
   Filter: ((proname)::text ~~ 'bool%'::text)
   Buffers: shared hit=12
 Planning Time: 0.448 ms
 Execution Time: 0.042 ms
(6 rows)
=> explain (analyze,buffers) select * from test_btree_ops where proname like 'bool%';
                                                                  QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_btree_ops_proname_idx on test_btree_ops  (cost=0.28..3.33 rows=27 width=17) (actual
time=0.013..0.022rows=20 loops=1) 
   Index Cond: ((proname ~>=~ 'bool'::text) AND (proname ~<~ 'boom'::text))
   Filter: (proname ~~ 'bool%'::text)
   Buffers: shared hit=12
 Planning Time: 0.115 ms
 Execution Time: 0.035 ms
(6 rows)
=> explain (analyze,buffers) select * from test_sp_gist where proname like 'bool%';
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_sp_gist_proname_idx on test_sp_gist  (cost=0.14..3.19 rows=27 width=17) (actual
time=0.022..0.031rows=20 loops=1) 
   Index Cond: ((proname ~>=~ 'bool'::text) AND (proname ~<~ 'boom'::text))
   Filter: (proname ~~ 'bool%'::text)
   Buffers: shared hit=7
 Planning Time: 0.132 ms
 Execution Time: 0.046 ms
=> explain (analyze,buffers) select * from test_sp_gist where proname ^@ 'bool%';
                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_sp_gist_proname_idx on test_sp_gist  (cost=0.14..2.16 rows=1 width=17) (actual time=0.019..0.019
rows=0loops=1) 
   Index Cond: (proname ^@ 'bool%'::text)
   Buffers: shared hit=3
 Planning Time: 0.082 ms
 Execution Time: 0.029 ms
(5 rows)

Conclusion.

As you can see to accelerate prefix searching in any case you need to add one additional index. The special operator
"^@"with SP-GiST index give the best result. The surprise is in that about it is said nothing in the official
documentation.




pgsql-general by date:

Previous
From: Marco Ippolito
Date:
Subject: Re: Unable to connect to the database: TypeError: net.Socket is not a constructor
Next
From: pinker
Date:
Subject: 1GB of maintenance work mem