Re: Clarification of the "simple" dictionary - Mailing list pgsql-general
From | Andreas Joseph Krogh |
---|---|
Subject | Re: Clarification of the "simple" dictionary |
Date | |
Msg-id | 4C488648.3000602@officenet.no Whole thread Raw |
In response to | Re: Clarification of the "simple" dictionary (Oleg Bartunov <oleg@sai.msu.su>) |
Responses |
Re: Clarification of the "simple" dictionary
(Oleg Bartunov <oleg@sai.msu.su>)
|
List | pgsql-general |
On 07/22/2010 07:44 PM, Oleg Bartunov wrote: > Don't guess, but read docs > http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY > > > 12.6.2. Simple Dictionary > > The simple dictionary template operates by converting the input token > to lower case and checking it against a file of stop words. If it is > found in the file then an empty array is returned, causing the token > to be discarded. If not, the lower-cased form of the word is returned > as the normalized lexeme. Alternatively, the dictionary can be > configured to report non-stop-words as unrecognized, allowing them to > be passed on to the next dictionary in the list. > > d=# \dFd+ simple > List of text search > dictionaries > Schema | Name | Template | Init options > | Description > ------------+--------+-------------------+--------------+----------------------------------------------------------- > > pg_catalog | simple | pg_catalog.simple | | simple > dictionary: just lower case and check for stopword > > By default it has no Init options, so it doesn't check for stopwords. Guess what - I *have* read the docs which sais "...and checking it against a file of stop words". What was unclear to me was whether or not it was configured with a stopwords-file or not as default, which is not the case I understand from your reply. Very good, fits my needs like a glove:-) It might be worth considering updating the docs to make this clearer? So - can we rely on "simple" to remain this way forever (no Init options) or is it better to make a copy of it with the same properties as today? It seems "simple" + the unaccent dict. available in 9.0 saves my day, thanks Mr. Bartunov. -- Andreas Joseph Krogh<andreak@officenet.no> Senior Software Developer / CTO ------------------------+---------------------------------------------+ OfficeNet AS | The most difficult thing in the world is to | Rosenholmveien 25 | know how to do a thing and to watch | 1414 Trollåsen | somebody else doing it wrong, without | NORWAY | comment. | | | Tlf: +47 24 15 38 90 | | Fax: +47 24 15 38 91 | | Mobile: +47 909 56 963 | | ------------------------+---------------------------------------------+
pgsql-general by date: