Re: Include Lists for Text Search - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: Include Lists for Text Search
Date
Msg-id Pine.LNX.4.64.0709101758520.2767@sn.sai.msu.ru
Whole thread Raw
In response to Re: Include Lists for Text Search  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Mon, 10 Sep 2007, Simon Riggs wrote:

> On Mon, 2007-09-10 at 16:35 +0400, Oleg Bartunov wrote:
>> On Mon, 10 Sep 2007, Simon Riggs wrote:
>>
>>> On Mon, 2007-09-10 at 16:10 +0400, Oleg Bartunov wrote:
>>>> On Mon, 10 Sep 2007, Simon Riggs wrote:
>>>>
>>>>> It seems possible to write your own functions to support various
>>>>> possibilities with text search.
>>>>>
>>>>> One of the more common thoughts is to have a list of words that you
>>>>> would like to include, i.e. the opposite of a stop word list.
>>>>>
>>>>> There are clear indications that indexing too many words is a problem
>>>>> for both GIN and GIST. If people already know what they'll be looking
>>>>> for and what they will never be looking for, it seems easier to supply
>>>>> that list up front, rather than hide it behind lots of hand-crafted
>>>>> code.
>>>>>
>>>>> Can we include that functionality now?
>>>>
>>>> This could be realized very easyly using dict_strict, which returns
>>>> only known words, and mapping contains only this dictionary. So,
>>>> feel free to write it and submit.
>>>
>>> So there isn't one yet, but you think it will be easy to write and that
>>> we should call it dict_strict?
>>
>> we have dict_synonym already and if your list is not big you'll be happy.
>
> So I need to do something like
>
> CREATE TEXT SEARCH DICTIONARY my_diction (
>    template = snowball,
>    synonym = include_only_these_words
> );
>
> which will then look for a file called include_only_these_words.syn?
>
> I would prefer to be able to do something like this
>
> CREATE TEXT SEARCH DICTIONARY my_diction (
>    template = snowball,
>    include = justthese
> );
> ...which makes more sense to anyone reading it
> and I also want to make the comparison case insensitive.
>
> Would it be better to
> 1. include a new dictionary file (dict_strict, as you suggest)
> 2. a) allow case sensitivity as another option in dictionaries
>   b) allow "include" as another word for "stoplist", but with the
> meaning reversed?
>
> e.g.
>
> CREATE TEXT SEARCH DICTIONARY my_diction (
>    template = snowball,
>    include = justthese,
>    case_sensitive = true
> );

No, you need to write new template, which efficiently works with
big lists and support case insensitive comparison.
 CREATE TEXT SEARCH TEMPLATE biglist (  ..... );
 CREATE TEXT SEARCH DICTIONARY my_diction (    TEMPLATE = biglist,    DictFile = words,    case_sensitive =  true );


    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


pgsql-hackers by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: Include Lists for Text Search
Next
From: Tom Lane
Date:
Subject: Re: invalidly encoded strings