Re: TODO item: Implement Boyer-Moore searching in LIKE queries - Mailing list pgsql-hackers

From Tom Lane
Subject Re: TODO item: Implement Boyer-Moore searching in LIKE queries
Date
Msg-id 6654.1471022876@sss.pgh.pa.us
Whole thread Raw
In response to Re: TODO item: Implement Boyer-Moore searching in LIKE queries  (Karan Sikka <karanssikka@gmail.com>)
List pgsql-hackers
Karan Sikka <karanssikka@gmail.com> writes:
>> Having said that, I've had a bee in my bonnet for a long time about
>> removing per-row setup cost for repetitive regex matches, and
>> whatever infrastructure that needs would work for this too.

> What are the per-row setup costs for regex matches?

Well, they're pretty darn high if you have more active regexps than will
fit in that cache, and even if you don't, the cache lookup seems a bit
inefficient.  What I'd really like to do is get rid of that cache in favor
of having a way to treat a precompiled regexp as a constant.  I think this
is probably possible via inventing a "regexp" datatype, which we make the
declared RHS input type for the ~ operator, and give it an implicit cast
from text so that existing queries don't break.  The compiled regexp tree
structure contains pointers so it could never go to disk, but now that
we have the "expanded datum" infrastructure you could imagine that the
on-disk representation is the same as text but we support adding a
compiled tree to it in-memory.

Or maybe we just need a smarter cache mechanism in regexp.c.  A cache
like that might be the only way to deal with a query using variable
patterns (e.g, pattern argument coming from a table column).  But it
seems like basically the wrong approach for the common case of a
constant pattern.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Karan Sikka
Date:
Subject: Re: TODO item: Implement Boyer-Moore searching in LIKE queries
Next
From: Jeff Janes
Date:
Subject: Re: Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)