Thread: Pattern matching operators a index

Pattern matching operators a index

From
Soroosh Sardari
Date:
Hi

I'm developing a new type for character string, like varchar. I wrote operators for btree and so forth.
I wonder how pattern matching operators using btree index, because btree operator class ony knows about >, >=, <=, and = operators, but operators for pattern matching, such as LIKE, are not known for btree access method.

Now my question is:
Is Postgre using btree for pattern matching query for varchar or other character string types?

If it does, how i implement it for my new type?

Regards,
Soroosh

Re: Pattern matching operators a index

From
Heikki Linnakangas
Date:
On 09.10.2013 13:24, Soroosh Sardari wrote:
> I'm developing a new type for character string, like varchar. I wrote
> operators for btree and so forth.
> I wonder how pattern matching operators using btree index, because btree
> operator class ony knows about>,>=,<=, and = operators, but operators
> for pattern matching, such as LIKE, are not known for btree access method.
>
> Now my question is:
> Is Postgre using btree for pattern matching query for varchar or other
> character string types?
>
> If it does, how i implement it for my new type?

Yes, Postgres can use b-tree for LIKE, if the pattern contains a fixed 
prefix. For example, "col LIKE 'foo%'" can use an index. Unfortunately 
the support for that is hardcoded for the built-in pattern matching 
operators, and it's not possible to do the same for a custom data type 
without changing the backend code. The code that does the transformation 
is in src/backend/optimizer/path/indxpath.c, see section 'routines for 
"special" indexable operators'.

There has been some talk on generalizing that, but no-one's gotten 
around to it. See e.g 
http://www.postgresql.org/message-id/9860.1364013108@sss.pgh.pa.us. 
Patches are welcome.

- Heikki



Re: Pattern matching operators a index

From
Kevin Grittner
Date:
Soroosh Sardari <soroosh.sardari@gmail.com> wrote:

> I'm developing a new type for character string, like varchar. I
> wrote operators for btree and so forth.
>
> I wonder how pattern matching operators using btree index,
> because btree operator class ony knows about >, >=, <=, and =
> operators, but operators for pattern matching, such as LIKE, are
> not known for btree access method.

In addition to Heikki's answer, which more directly answers your
question about what btree can do for you, you might want to look at
the pg_trgm extension and the gist_trgm_ops and gin_trgm_ops
operator classes, to see what other index types can do for you.
Specifically, while a btree index can only hlep much if the pattern
is anchored at the left, the regular expression searches need not
be.

test=# \d war_and_peace
                            Table "public.war_and_peace"
  Column  |  Type   |                           Modifiers
----------+---------+----------------------------------------------------------------
 lineno   | integer | not null default nextval('war_and_peace_lineno_seq'::regclass)
 linetext | text    | not null
Indexes:
    "war_and_peace_pkey" PRIMARY KEY, btree (lineno)

test=# explain analyze select * from war_and_peace where linetext ~ 'gentlemen';
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on war_and_peace  (cost=0.00..947.79 rows=283 width=76) (actual time=4.697..62.065 rows=67 loops=1)
   Filter: (linetext ~ 'gentlemen'::text)
   Rows Removed by Filter: 36636
 Total runtime: 62.101 ms
(4 rows)

test=# create index war_and_peace_linetext_gist on war_and_peace using gist (linetext gist_trgm_ops);
CREATE INDEX
test=# analyze;
ANALYZE
test=# explain analyze select * from war_and_peace where linetext ~ 'gentlemen';
                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=4.30..15.63 rows=3 width=76) (actual time=23.231..24.436 rows=67 loops=1)
   Recheck Cond: (linetext ~ 'gentlemen'::text)
   Rows Removed by Index Recheck: 22
   ->  Bitmap Index Scan on war_and_peace_linetext_gist  (cost=0.00..4.30 rows=3 width=0) (actual time=23.200..23.200
rows=89loops=1) 
         Index Cond: (linetext ~ 'gentlemen'::text)
 Total runtime: 24.483 ms
(6 rows)

test=# drop index war_and_peace_linetext_gist;
DROP INDEX
test=# create index war_and_peace_linetext_gin on war_and_peace using gin (linetext gin_trgm_ops);
CREATE INDEX
test=# analyze;
ANALYZE
test=# explain analyze select * from war_and_peace where linetext ~ 'gentlemen';
                                                             QUERY
PLAN                                                              

-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=68.02..79.35 rows=3 width=76) (actual time=2.393..5.206 rows=67 loops=1)
   Recheck Cond: (linetext ~ 'gentlemen'::text)
   Rows Removed by Index Recheck: 22
   ->  Bitmap Index Scan on war_and_peace_linetext_gin  (cost=0.00..68.02 rows=3 width=0) (actual time=2.360..2.360
rows=89loops=1) 
         Index Cond: (linetext ~ 'gentlemen'::text)
 Total runtime: 5.263 ms
(6 rows)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Pattern matching operators a index

From
Amit Kapila
Date:
On Wed, Oct 9, 2013 at 4:20 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 09.10.2013 13:24, Soroosh Sardari wrote:
>> Now my question is:
>> Is Postgre using btree for pattern matching query for varchar or other
>> character string types?
>>
>> If it does, how i implement it for my new type?
>
>
> Yes, Postgres can use b-tree for LIKE, if the pattern contains a fixed
> prefix. For example, "col LIKE 'foo%'" can use an index. Unfortunately the
> support for that is hardcoded for the built-in pattern matching operators,
> and it's not possible to do the same for a custom data type without changing
> the backend code. The code that does the transformation is in
> src/backend/optimizer/path/indxpath.c, see section 'routines for "special"
> indexable operators'.
>
> There has been some talk on generalizing that, but no-one's gotten around to
> it.

As per initial thoughts, here I think there are majorly two
functionalities for which some hooks are needed.

1. Identification of operator as a special operator and verification
if it can be indexable.  It is not guaranteed that operator LIKE can be considered
indexable, it is decided by match_special_index_operator() based on
clause.  So to generalize it, there is a need to have an additional column's
amopspecial(to indicate that there is need to verify that  this op is indexable) and amopverify (function that can
verifyif
 
special operator is indexable) in pg_amop.

2. Expansion of clauses in a different way for special operator's.   During expansion of opclauses
(expand_indexqual_opclause()),LIKE
 
operator clause needs to be expanded to "textfield >= 'abc' AND
textfield <   'abd'". So here again there is a need to have an additional column
in pg_amop amopexpand (function to expand clauses of special
operators).

I am sure there will be many more things at top level which might be
required to generalize LIKE operator optimisation, but I could think
of only above as per my initial look at this problem. I think more
thoughts/suggestions on this problem can help someone to attempt a
patch for this problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com