Thread: Pattern matching operators a index
Hi
I'm developing a new type for character string, like varchar. I wrote operators for btree and so forth.I wonder how pattern matching operators using btree index, because btree operator class ony knows about >, >=, <=, and = operators, but operators for pattern matching, such as LIKE, are not known for btree access method.
On 09.10.2013 13:24, Soroosh Sardari wrote: > I'm developing a new type for character string, like varchar. I wrote > operators for btree and so forth. > I wonder how pattern matching operators using btree index, because btree > operator class ony knows about>,>=,<=, and = operators, but operators > for pattern matching, such as LIKE, are not known for btree access method. > > Now my question is: > Is Postgre using btree for pattern matching query for varchar or other > character string types? > > If it does, how i implement it for my new type? Yes, Postgres can use b-tree for LIKE, if the pattern contains a fixed prefix. For example, "col LIKE 'foo%'" can use an index. Unfortunately the support for that is hardcoded for the built-in pattern matching operators, and it's not possible to do the same for a custom data type without changing the backend code. The code that does the transformation is in src/backend/optimizer/path/indxpath.c, see section 'routines for "special" indexable operators'. There has been some talk on generalizing that, but no-one's gotten around to it. See e.g http://www.postgresql.org/message-id/9860.1364013108@sss.pgh.pa.us. Patches are welcome. - Heikki
Soroosh Sardari <soroosh.sardari@gmail.com> wrote: > I'm developing a new type for character string, like varchar. I > wrote operators for btree and so forth. > > I wonder how pattern matching operators using btree index, > because btree operator class ony knows about >, >=, <=, and = > operators, but operators for pattern matching, such as LIKE, are > not known for btree access method. In addition to Heikki's answer, which more directly answers your question about what btree can do for you, you might want to look at the pg_trgm extension and the gist_trgm_ops and gin_trgm_ops operator classes, to see what other index types can do for you. Specifically, while a btree index can only hlep much if the pattern is anchored at the left, the regular expression searches need not be. test=# \d war_and_peace Table "public.war_and_peace" Column | Type | Modifiers ----------+---------+---------------------------------------------------------------- lineno | integer | not null default nextval('war_and_peace_lineno_seq'::regclass) linetext | text | not null Indexes: "war_and_peace_pkey" PRIMARY KEY, btree (lineno) test=# explain analyze select * from war_and_peace where linetext ~ 'gentlemen'; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Seq Scan on war_and_peace (cost=0.00..947.79 rows=283 width=76) (actual time=4.697..62.065 rows=67 loops=1) Filter: (linetext ~ 'gentlemen'::text) Rows Removed by Filter: 36636 Total runtime: 62.101 ms (4 rows) test=# create index war_and_peace_linetext_gist on war_and_peace using gist (linetext gist_trgm_ops); CREATE INDEX test=# analyze; ANALYZE test=# explain analyze select * from war_and_peace where linetext ~ 'gentlemen'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on war_and_peace (cost=4.30..15.63 rows=3 width=76) (actual time=23.231..24.436 rows=67 loops=1) Recheck Cond: (linetext ~ 'gentlemen'::text) Rows Removed by Index Recheck: 22 -> Bitmap Index Scan on war_and_peace_linetext_gist (cost=0.00..4.30 rows=3 width=0) (actual time=23.200..23.200 rows=89loops=1) Index Cond: (linetext ~ 'gentlemen'::text) Total runtime: 24.483 ms (6 rows) test=# drop index war_and_peace_linetext_gist; DROP INDEX test=# create index war_and_peace_linetext_gin on war_and_peace using gin (linetext gin_trgm_ops); CREATE INDEX test=# analyze; ANALYZE test=# explain analyze select * from war_and_peace where linetext ~ 'gentlemen'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on war_and_peace (cost=68.02..79.35 rows=3 width=76) (actual time=2.393..5.206 rows=67 loops=1) Recheck Cond: (linetext ~ 'gentlemen'::text) Rows Removed by Index Recheck: 22 -> Bitmap Index Scan on war_and_peace_linetext_gin (cost=0.00..68.02 rows=3 width=0) (actual time=2.360..2.360 rows=89loops=1) Index Cond: (linetext ~ 'gentlemen'::text) Total runtime: 5.263 ms (6 rows) -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 9, 2013 at 4:20 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 09.10.2013 13:24, Soroosh Sardari wrote: >> Now my question is: >> Is Postgre using btree for pattern matching query for varchar or other >> character string types? >> >> If it does, how i implement it for my new type? > > > Yes, Postgres can use b-tree for LIKE, if the pattern contains a fixed > prefix. For example, "col LIKE 'foo%'" can use an index. Unfortunately the > support for that is hardcoded for the built-in pattern matching operators, > and it's not possible to do the same for a custom data type without changing > the backend code. The code that does the transformation is in > src/backend/optimizer/path/indxpath.c, see section 'routines for "special" > indexable operators'. > > There has been some talk on generalizing that, but no-one's gotten around to > it. As per initial thoughts, here I think there are majorly two functionalities for which some hooks are needed. 1. Identification of operator as a special operator and verification if it can be indexable. It is not guaranteed that operator LIKE can be considered indexable, it is decided by match_special_index_operator() based on clause. So to generalize it, there is a need to have an additional column's amopspecial(to indicate that there is need to verify that this op is indexable) and amopverify (function that can verifyif special operator is indexable) in pg_amop. 2. Expansion of clauses in a different way for special operator's. During expansion of opclauses (expand_indexqual_opclause()),LIKE operator clause needs to be expanded to "textfield >= 'abc' AND textfield < 'abd'". So here again there is a need to have an additional column in pg_amop amopexpand (function to expand clauses of special operators). I am sure there will be many more things at top level which might be required to generalize LIKE operator optimisation, but I could think of only above as per my initial look at this problem. I think more thoughts/suggestions on this problem can help someone to attempt a patch for this problem. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com