Re: WIP: index support for regexp search - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: WIP: index support for regexp search
Date
Msg-id CAPpHfdswF+FHrNtBnCgbwf-hxLUcMssf=L7cX_CYAq5ncUsrPA@mail.gmail.com
Whole thread Raw
In response to Re: WIP: index support for regexp search  ("Erik Rijkers" <er@xs4all.nl>)
List pgsql-hackers
On Tue, Dec 18, 2012 at 12:51 PM, Erik Rijkers <er@xs4all.nl> wrote:
On Tue, December 18, 2012 09:45, Alexander Korotkov wrote:
>
> You should use {0,n} to express from 0 to n occurences.
>


Thanks, but I know that of course.  It's a testing program; and in the end robustness with
unexpected or even wrong input is as important as performance.  (to put it bluntly, I am also
trying to get your patch to fall over ;-))

I found most of regressions in 0.9 version to be in {,n} cases. New version of patch use more of trigrams than previous versions.
For example for regex 'x[aeiou]{,2}q'.
In 0.7 version we use trigrams '__2', '_2_' and '__q'.
In 0.9 version we use trigrams 'xa_', 'xe_', 'xi_', 'xo_', 'xu_', '__2', '_2_' and '__q'.

But, actually trigram '__2' or '_2_' never occurs. It enough to have one of them, all others are just causing a slowdown. Simultaneously, we can't decide reasonably which trigrams to use without knowing their frequencies. For example, if trigrams 'xa_', 'xe_', 'xi_', 'xo_', 'xu_' were altogether more rare than '__2', newer version of patch would be faster.


------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: Enabling Checksums
Next
From: Heikki Linnakangas
Date:
Subject: Re: Error restoring from a base backup taken from standby