Re: Clarification of the "simple" dictionary - Mailing list pgsql-general

From Andreas Joseph Krogh
Subject Re: Clarification of the "simple" dictionary
Date
Msg-id 4C488648.3000602@officenet.no
Whole thread Raw
In response to Re: Clarification of the "simple" dictionary  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: Clarification of the "simple" dictionary  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-general
On 07/22/2010 07:44 PM, Oleg Bartunov wrote:
> Don't guess, but read docs
> http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY
>
>
> 12.6.2. Simple Dictionary
>
> The simple dictionary template operates by converting the input token
> to lower case and checking it against a file of stop words. If it is
> found in the file then an empty array is returned, causing the token
> to be discarded. If not, the lower-cased form of the word is returned
> as the normalized lexeme. Alternatively, the dictionary can be
> configured to report non-stop-words as unrecognized, allowing them to
> be passed on to the next dictionary in the list.
>
> d=# \dFd+ simple
>                                           List of text search
> dictionaries
>    Schema   |  Name  |     Template      | Init options
> |                        Description
> ------------+--------+-------------------+--------------+-----------------------------------------------------------
>
>  pg_catalog | simple | pg_catalog.simple |              | simple
> dictionary: just lower case and check for stopword
>
> By default it has no Init options, so it doesn't check for stopwords.

Guess what - I *have* read the docs which sais "...and checking it
against a file of stop words". What was unclear to me was whether or not
it was configured with a stopwords-file or not as default, which is not
the case I understand from your reply. Very good, fits my needs like a
glove:-) It might be worth considering updating the docs to make this
clearer?

So - can we rely on "simple" to remain this way forever (no Init
options) or is it better to make a copy of it with the same properties
as today?

It seems "simple" + the unaccent dict. available in 9.0 saves my day,
thanks Mr. Bartunov.

--
Andreas Joseph Krogh<andreak@officenet.no>
Senior Software Developer / CTO
------------------------+---------------------------------------------+
OfficeNet AS            | The most difficult thing in the world is to |
Rosenholmveien 25       | know how to do a thing and to watch         |
1414 Trollåsen          | somebody else doing it wrong, without       |
NORWAY                  | comment.                                    |
                         |                                             |
Tlf:    +47 24 15 38 90 |                                             |
Fax:    +47 24 15 38 91 |                                             |
Mobile: +47 909  56 963 |                                             |
------------------------+---------------------------------------------+


pgsql-general by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: Clarification of the "simple" dictionary
Next
From: Armand Turpel
Date:
Subject: varchar[] or text[]