Re: WIP: index support for regexp search - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: WIP: index support for regexp search
Date
Msg-id CAPpHfdtkPgtDANjAXMnyAycpsahgGedQZ7VU+KfW6Y_5Jx1O=g@mail.gmail.com
Whole thread Raw
In response to Re: WIP: index support for regexp search  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: WIP: index support for regexp search  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Wed, Jan 23, 2013 at 7:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 23.01.2013 09:36, Alexander Korotkov wrote:
>> On Wed, Jan 23, 2013 at 6:08 AM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>> The biggest problem is that I really don't care for the idea of
>>> contrib/pg_trgm being this cozy with the innards of regex_t.

>> The only option I see now is to provide a method like "export_cnfa" which
>> would export corresponding CNFA in fixed format.

> Yeah, I think that makes sense. The transformation code in trgm_regexp.c
> would probably be more readable too, if it didn't have to deal with the
> regex guts representation of the CNFA. Also, once you have intermediate
> representation of the original CNFA, you could do some of the
> transformation work on that representation, before building the
> "tranformed graph" containing trigrams. You could eliminate any
> non-alphanumeric characters, joining states connected by arcs with
> non-alphanumeric characters, for example.

It's not just the CNFA though; the other big API problem is with mapping
colors back to characters.  Right now, that not only knows way too much
about a part of the regex internals we have ambitions to change soon,
but it also requires pg_wchar2mb_with_len() and lowerstr(), neither of
which should be known to the regex library IMO.  So I'm not sure how we
divvy that up sanely.  To be clear: I'm not going to insist that we have
to have a clean API factorization before we commit this at all.  But it
worries me if we don't even know how we could get to that, because we
are going to need it eventually.

Now, we probably don't have enough of time before 9.3 to solve an API problem :(. It's likely we have to choose either commit to 9.3 without clean API factorization or postpone it to 9.4.

------
With best regards,
Alexander Korotkov. 

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Support for REINDEX CONCURRENTLY
Next
From: Andres Freund
Date:
Subject: Re: Writable foreign tables: how to identify rows