Re: WIP: index support for regexp search - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: WIP: index support for regexp search
Date
Msg-id CAPpHfdtWpbVoXGU+damLwb+_-1wcRRwuzm2XSuy8Qr1aDdnDzw@mail.gmail.com
Whole thread Raw
In response to Re: WIP: index support for regexp search  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Apr 9, 2013 at 9:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Mon, Apr 8, 2013 at 9:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I spent the weekend hacking on this, making a number of bug fixes and a
>> whole lot of cosmetic changes.  I think there are large parts of this
>> that are in committable shape now, but I still find the actual graph
>> transformation logic to be mostly unintelligible.  I think what's most
>> obscure is the distinction between the arcs list and the keys list of
>> each state in the expanded graph.  I get the impression that the
>> general idea is for the arcs to represent exactly-known transitions
>> while the keys represent imprecisely-known transitions ... but there
>> seems to be at least some leakage between those categories.  Could
>> you write down a specification for what's supposed to be happening
>> there?

> Here is my try to specify it.

Thanks.  I hacked on this some more and committed it.  I found a number
of bugs along the way with respect to handling of word boundaries
(partially-blank transition trigrams) and EOL-color ($) handling.
I think it's all fixed now but it could definitely use some more
study and testing.
 
Great, thanks! I also will do some more testing.

One issue that bothered me is that the regression tests really don't
provide much visibility into what the code is doing.  Some of the bugs
had to do with failing to generate expected trigrams, for instance
col ~ 'foo bar' only generating trigram "foo" and not "bar".  This still
led to getting the right answer, so the error was invisible as far as the
tests were concerned.  Is it worth thinking of a way to expose what the
extract function did at SQL level, so we could test more carefully?

Yes, I also had similar idea. But, I think we need some relatively stable representation of resulting graph in order to expose it. There could be a lot of equivalent graphs. Some changes in implementation could lead to change from one equivalent graph to another. It would be better to not rewrite tests in this case. Ideally, we should expose some representation which is the same for all equivalent graphs. However, it doesn't seem to be realistic. But, I think we could at least make it stable to order sequence of states and color trigrams. Another option I see is to expose just set of trigrams. It doesn't have completeness of information, but it is quite stable.

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: corrupt pages detected by enabling checksums
Next
From: Christoph Berg
Date:
Subject: Re: Re: [BUGS] BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog