Re: Include Lists for Text Search - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: Include Lists for Text Search |
Date | |
Msg-id | Pine.LNX.4.64.0709101758520.2767@sn.sai.msu.ru Whole thread Raw |
In response to | Re: Include Lists for Text Search (Simon Riggs <simon@2ndquadrant.com>) |
List | pgsql-hackers |
On Mon, 10 Sep 2007, Simon Riggs wrote: > On Mon, 2007-09-10 at 16:35 +0400, Oleg Bartunov wrote: >> On Mon, 10 Sep 2007, Simon Riggs wrote: >> >>> On Mon, 2007-09-10 at 16:10 +0400, Oleg Bartunov wrote: >>>> On Mon, 10 Sep 2007, Simon Riggs wrote: >>>> >>>>> It seems possible to write your own functions to support various >>>>> possibilities with text search. >>>>> >>>>> One of the more common thoughts is to have a list of words that you >>>>> would like to include, i.e. the opposite of a stop word list. >>>>> >>>>> There are clear indications that indexing too many words is a problem >>>>> for both GIN and GIST. If people already know what they'll be looking >>>>> for and what they will never be looking for, it seems easier to supply >>>>> that list up front, rather than hide it behind lots of hand-crafted >>>>> code. >>>>> >>>>> Can we include that functionality now? >>>> >>>> This could be realized very easyly using dict_strict, which returns >>>> only known words, and mapping contains only this dictionary. So, >>>> feel free to write it and submit. >>> >>> So there isn't one yet, but you think it will be easy to write and that >>> we should call it dict_strict? >> >> we have dict_synonym already and if your list is not big you'll be happy. > > So I need to do something like > > CREATE TEXT SEARCH DICTIONARY my_diction ( > template = snowball, > synonym = include_only_these_words > ); > > which will then look for a file called include_only_these_words.syn? > > I would prefer to be able to do something like this > > CREATE TEXT SEARCH DICTIONARY my_diction ( > template = snowball, > include = justthese > ); > ...which makes more sense to anyone reading it > and I also want to make the comparison case insensitive. > > Would it be better to > 1. include a new dictionary file (dict_strict, as you suggest) > 2. a) allow case sensitivity as another option in dictionaries > b) allow "include" as another word for "stoplist", but with the > meaning reversed? > > e.g. > > CREATE TEXT SEARCH DICTIONARY my_diction ( > template = snowball, > include = justthese, > case_sensitive = true > ); No, you need to write new template, which efficiently works with big lists and support case insensitive comparison. CREATE TEXT SEARCH TEMPLATE biglist ( ..... ); CREATE TEXT SEARCH DICTIONARY my_diction ( TEMPLATE = biglist, DictFile = words, case_sensitive = true ); Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: