Indexing unknown words with Tsearch2 - Mailing list pgsql-general

From Greg Maitrallain
Subject Indexing unknown words with Tsearch2
Date
Msg-id 49D36E3F.1080207@evodia.fr
Whole thread Raw
Responses Re: Indexing unknown words with Tsearch2
List pgsql-general
Hi,

First of all, excuse my poor english :)

I'm working on a fulltext database with tsearch2, which contains french
historical writings.
I'm using the fr_ispell dictionnary that can be found here :
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
(ispell-french.tar.gz
<http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/dicts/ispell/ispell-french.tar.gz>
- submitted by Max Jacob)
The database encoding is LATIN1

The problem is the writings contains many names of personnalities. For
example : Churchill (the database covers WWII). But when I try to search
for these names, nothing is found.

I tried many things, like this introduction :
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html
And I think the problem's root is that no lexem is found (I could even
say an empty lexem is found).

With the default en_stem dictionnary, I get this :

SELECT lexize('en_stem', 'churchill');
"{churchil}"

Then, I try to add the french dictionnary :

INSERT INTO pg_ts_dict
               (SELECT 'fr_ispell',
                       dict_init,
                       'DictFile="/home/.../french.dict",'
                       'AffFile="/home/.../french.aff",'
                       'StopFile="/home/.../french.stop"',
                       dict_lexize
                FROM pg_ts_dict
                WHERE dict_name = 'ispell_template');

And the result is :

SELECT lexize('fr_ispell', 'churchill');
""

My questions are :
- Is it OK to give empty string as a result for a word that is not in
the dictionnary, neither in the stop words ?
- Is there a way to get the word itself as a result, when the word is
not in the dictionnary, neither in the stop words ?
- If yes, how ?

I'm also interested in any information you could give me...
Many thanks !

Greg Maitrallain.

pgsql-general by date:

Previous
From: ries van Twisk
Date:
Subject: Need help with : org.postgresql.util.PSQLException : ERROR: deadlock detected
Next
From: Tom Lane
Date:
Subject: Re: Indexing unknown words with Tsearch2