Re: WIP: index support for regexp search - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: WIP: index support for regexp search
Date
Msg-id CAPpHfdv5xmBoTCkxuFBnD8LFtD8-DALQaqrGT7tRLLHKYg-OSQ@mail.gmail.com
Whole thread Raw
In response to Re: WIP: index support for regexp search  ("Erik Rijkers" <er@xs4all.nl>)
Responses Re: WIP: index support for regexp search  ("Erik Rijkers" <er@xs4all.nl>)
List pgsql-hackers
Hi!

Thank you for your feedback!

On Fri, Jan 20, 2012 at 3:33 AM, Erik Rijkers <er@xs4all.nl> wrote:
The patch yields spectacular speedups with small, simple-enough regexen.  But it does not do a
good enough job when guessing where to use the index and where fall back to Seq Scan.  This can
lead to (also spectacular) slow-downs, compared to Seq Scan.
Could you give some examples of regexes where index scan becomes slower than seq scan?
 
I guessed that MAX_COLOR_CHARS limits the character class size (to 4, in your patch), is that
true?   I can understand you want that value to be low to limit the above risk, but now it reduces
the usability of the feature a bit: one has to split up larger char-classes into several smaller
ones to make a statement use the index: i.e.:
Yes, MAX_COLOR_CHARS is number of maximum character in automata color when that color is divided to a separated characters. And it's likely there could be better solution than just have this hard limit.
 
Btw, it seems impossible to Ctrl-C out of a search once it is submitted; I suppose this is
normally necessary for perfomance reasons, but it would be useful te be able to compile a test
version that allows it.  I don't know how hard that would be.
I seems that Ctrl-C was impossible because procedure of trigrams exctraction becomes so long while it is not breakable. It's not difficult to make this procedure breakable, but actually it just shouldn't take so long.
 
There is also a minor bug, I think, when running with  'set enable_seqscan=off'  in combination
with a too-large regex:
Thanks for pointing. Will be fixed.

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Group commit, revised
Next
From: Alexander Korotkov
Date:
Subject: Re: WIP: index support for regexp search