Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords
Date
Msg-id Pine.LNX.4.64.0711141950040.7787@sn.sai.msu.ru
Whole thread Raw
In response to Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords
List pgsql-hackers
In principle the right way is to allow any dictionary have option
like 'PassThrough' and internal function get_dict_options(dict, option)
to check if PassThrough option is true.
Let's consider one example - removing accents.
In the past I always recommend people to use regex functions before
to_tsvector conversion to remove accents, but recently I was noticed that
such trick doesn't work with headline(). So, the only way is to have
special dictionary dict_remove_accent before, which  works as a filter.

I don't remember why do we left this for future releases, though.

Oleg
On Wed, 14 Nov 2007, Tom Lane wrote:

> This patch:
> http://archives.postgresql.org/pgsql-patches/2007-11/msg00137.php
> seems simple and useful enough that I think we ought to slip it into
> 8.3, even though we are far past feature freeze.
>
> As the "simple" dictionary type stands in CVS HEAD, it is only useful as
> the last dictionary in a stack, since it never passes anything on as
> unrecognized.  With the proposed AcceptAll = false option, it could be
> used to filter out some stopwords before feeding tokens to another
> dictionary.  While most dictionary types have their own stopword support,
> some of them match stopwords after their own normalization processing,
> and so there's no way to filter on pre-normalized words.  That seems
> like a good improvement, even without the specific need-example that
> Jan provided at the start of the thread.
>
> Normally we'd never consider adding a new feature so late in the
> development cycle, but this seems small enough and useful enough
> to make an exception.  Comments?
>
>             regards, tom lane
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords
Next
From: Tom Lane
Date:
Subject: Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords