On 23.01.2013 09:36, Alexander Korotkov wrote:
> Hi!
>
> Some quick answers to the part of notes/issues. I will provide rest of
> answers soon.
>
> On Wed, Jan 23, 2013 at 6:08 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>
>> The biggest problem is that I really don't care for the idea of
>> contrib/pg_trgm being this cozy with the innards of regex_t. Sooner
>> or later we are going to want to split the regex code off again as a
>> standalone library, and we *must* have a cleaner division of labor if
>> that is ever to happen. Not sure what a suitable API would look like
>> though.
>
> The only option I see now is to provide a method like "export_cnfa" which
> would export corresponding CNFA in fixed format.
Yeah, I think that makes sense. The transformation code in trgm_regexp.c
would probably be more readable too, if it didn't have to deal with the
regex guts representation of the CNFA. Also, once you have intermediate
representation of the original CNFA, you could do some of the
transformation work on that representation, before building the
"tranformed graph" containing trigrams. You could eliminate any
non-alphanumeric characters, joining states connected by arcs with
non-alphanumeric characters, for example.
- Heikki