Thread: CREATE CUSTOM TEXT SEARCH PARSER

CREATE CUSTOM TEXT SEARCH PARSER

From
Katharina kuhn
Date:
Hi,
I'd like to build a custom text search parser and then use it within a custom text search configuration.
It would be great if you could give us an example showing how to build a custom parser, including examples of start, gettoken and end functions.
It would be even greater if one could add custom rules for parsing text to the default parser....
I appreciate so much any hint!
Katharina

Re: CREATE CUSTOM TEXT SEARCH PARSER

From
"Kevin Grittner"
Date:
Katharina kuhn <katykuhn@gmail.com> wrote:

> I'd like to build a custom text search parser and then use it
> within a custom text search configuration.
> It would be great if you could give us an example showing how to
> build a custom parser, including examples of start, gettoken and
> end functions.

You might want to look at the contrib/test_parser directory.  Then
again, you might not -- I needed some custom tsearch2 parsing
behavior and struggled with a custom parser based on that for a
couple days before I decided that it was easier to use regular
expression functions within pl/pgsql to pick out what I wanted and
cast it to a tsvector.  This was less code and seemed less fragile
than the developing soemthing based on the contrib example. YMMV, of
course.

This motivated me to put a rewrite of the current tsearch2 parser to
something based on regular expressions onto my personal PostgreSQL
TODO list.  (No guarantees on when I might get to it, though.)

-Kevin

Re: CREATE CUSTOM TEXT SEARCH PARSER

From
Katharina kuhn
Date:
Thank you Kevin!
I'll look at the contrib/test_parser directory.
Any way, I agree with you. I actually made a pl/pgsql function for pre-parsing documents
based on my own needs, and cast the results to a tsvector normally. It works fine enough!
Katharina

On Tue, Nov 2, 2010 at 2:58 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Katharina kuhn <katykuhn@gmail.com> wrote:

> I'd like to build a custom text search parser and then use it
> within a custom text search configuration.
> It would be great if you could give us an example showing how to
> build a custom parser, including examples of start, gettoken and
> end functions.

You might want to look at the contrib/test_parser directory.  Then
again, you might not -- I needed some custom tsearch2 parsing
behavior and struggled with a custom parser based on that for a
couple days before I decided that it was easier to use regular
expression functions within pl/pgsql to pick out what I wanted and
cast it to a tsvector.  This was less code and seemed less fragile
than the developing soemthing based on the contrib example. YMMV, of
course.

This motivated me to put a rewrite of the current tsearch2 parser to
something based on regular expressions onto my personal PostgreSQL
TODO list.  (No guarantees on when I might get to it, though.)

-Kevin