Re: tsearch2 problem - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: tsearch2 problem
Date
Msg-id Pine.LNX.4.64.0810311425570.15810@sn.sai.msu.ru
Whole thread Raw
In response to Re: tsearch2 problem  ("Jodok Batlogg" <jodok@lovelysystems.com>)
List pgsql-general
On Fri, 31 Oct 2008, Jodok Batlogg wrote:

> hi oleg,
>
> thanks for your quick response,
>
> 2008/10/31 Oleg Bartunov <oleg@sai.msu.su>:
>> Jodok,
>>
>> you got what's you defined. Please, read documentation.
>> In short, word doesn't indexed if it is not recognized by any
>> dictionaried from stack of dictionaries. Put stemming dictionary at the end,
>> which recognizes everything.
>
> can you point me to "the" documentation where i could find that? i
> think i tried hard :)

well, it's not really hard
http://www.postgresql.org/docs/8.3/static/textsearch-dictionaries.html

"A text search configuration binds a parser together with a set of
dictionaries to process the parser's output tokens. For each token type that
the parser can return, a separate list of dictionaries is specified by the
configuration. When a token of that type is found by the parser, each
dictionary in the list is consulted in turn, until some dictionary recognizes
it as a known word. If it is identified as a stop word, or if no dictionary
recognizes the token, it will be discarded and not indexed or searched for.
The general rule for configuring a list of dictionaries is to place first
the most narrow, most specific dictionary, then the more general dictionaries,
finishing with a very general dictionary, like a Snowball stemmer or simple,
which recognizes everything."

>
> however - problem a) is fixed. thanks :)
> nevertheless i still have the problem that words with '/' are beeing
> interpreted as file paths instead of words. any idea how i could tweak
> this?

several ways:
1. use your own parser
2. use encode/decode functions, which cheat default parser. For example,
    encodeslash('aa/bb') -> aaxxxxxxbb. But then you should understand, that
    dictionary like ispell will not be able to recognize it.


>
> thanks
>
> jodok
>
>>
>> Oleg
>> On Fri, 31 Oct 2008, Jodok Batlogg wrote:
>>
>>> we're using tsearch2 with the german dictionary
>>>
>>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-german-compound.tar.gz
>>> for fulltext search.
>>>
>>> the indexing is configured as follows:
>>>
>>> CREATE TEXT SEARCH DICTIONARY public.german (
>>>   TEMPLATE = ispell,
>>>   DictFile = german,
>>>   AffFile = german,
>>>   StopWords = german
>>> );
>>>
>>> CREATE TEXT SEARCH CONFIGURATION public.default ( COPY = pg_catalog.german
>>> );
>>>
>>> ALTER TEXT SEARCH CONFIGURATION public.default
>>>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>>>                     word, hword, hword_part
>>>   WITH public.german;
>>>
>>> -------------------------
>>>
>>> select * from ts_debug('default', 'hundshЪЪtte');
>>> works as expected: creates the two lexemes: "{hund,hЪЪtte}"
>>>
>>> BUT
>>>
>>> SELECT to_tsvector('default','lovely und bauarbeiter/in');
>>> looses a lot of stuff:
>>> "'bauarbeiter/in':2"
>>>
>>> some more debugging shows:
>>>
>>> SELECT * from ts_debug('default','lovely und bauarbeiter/in');
>>>
>>> "asciiword";"Word, all ASCII";"lovely";"{german}";"german";""
>>> "blank";"Space symbols";" ";"{}";"";""
>>> "asciiword";"Word, all ASCII";"und";"{german}";"german";"{}"
>>> "blank";"Space symbols";" ";"{}";"";""
>>> "file";"File or path
>>> name";"bauarbeiter/in";"{simple}";"simple";"{bauarbeiter/in}"
>>>
>>> a) unknown words are just beeing dropped
>>> b) words with slashes are interpreted as file paths and the first path
>>> is beeing dropped.
>>>
>>> any idea how we can fix this?
>>>
>>> jodok
>>>
>>>
>>
>>        Regards,
>>                Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>
>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-general by date:

Previous
From: Patricio Mora
Date:
Subject: Bad behaviour in Sun Cluster
Next
From: Oleg Bartunov
Date:
Subject: Re: tsearch2 problem