Re: tsearch2 problem - Mailing list pgsql-general

From Jodok Batlogg
Subject Re: tsearch2 problem
Date
Msg-id 47b22fd00810310330n7fc6ca61i15964f7de32038e0@mail.gmail.com
Whole thread Raw
In response to Re: tsearch2 problem  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: tsearch2 problem  (Oleg Bartunov <oleg@sai.msu.su>)
Re: tsearch2 problem  (John DeSoi <desoi@pgedit.com>)
List pgsql-general
hi oleg,

thanks for your quick response,

2008/10/31 Oleg Bartunov <oleg@sai.msu.su>:
> Jodok,
>
> you got what's you defined. Please, read documentation.
> In short, word doesn't indexed if it is not recognized by any
> dictionaried from stack of dictionaries. Put stemming dictionary at the end,
> which recognizes everything.

can you point me to "the" documentation where i could find that? i
think i tried hard :)

however - problem a) is fixed. thanks :)
nevertheless i still have the problem that words with '/' are beeing
interpreted as file paths instead of words. any idea how i could tweak
this?

thanks

jodok

>
> Oleg
> On Fri, 31 Oct 2008, Jodok Batlogg wrote:
>
>> we're using tsearch2 with the german dictionary
>>
>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-german-compound.tar.gz
>> for fulltext search.
>>
>> the indexing is configured as follows:
>>
>> CREATE TEXT SEARCH DICTIONARY public.german (
>>   TEMPLATE = ispell,
>>   DictFile = german,
>>   AffFile = german,
>>   StopWords = german
>> );
>>
>> CREATE TEXT SEARCH CONFIGURATION public.default ( COPY = pg_catalog.german
>> );
>>
>> ALTER TEXT SEARCH CONFIGURATION public.default
>>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>>                     word, hword, hword_part
>>   WITH public.german;
>>
>> -------------------------
>>
>> select * from ts_debug('default', 'hundshЪЪtte');
>> works as expected: creates the two lexemes: "{hund,hЪЪtte}"
>>
>> BUT
>>
>> SELECT to_tsvector('default','lovely und bauarbeiter/in');
>> looses a lot of stuff:
>> "'bauarbeiter/in':2"
>>
>> some more debugging shows:
>>
>> SELECT * from ts_debug('default','lovely und bauarbeiter/in');
>>
>> "asciiword";"Word, all ASCII";"lovely";"{german}";"german";""
>> "blank";"Space symbols";" ";"{}";"";""
>> "asciiword";"Word, all ASCII";"und";"{german}";"german";"{}"
>> "blank";"Space symbols";" ";"{}";"";""
>> "file";"File or path
>> name";"bauarbeiter/in";"{simple}";"simple";"{bauarbeiter/in}"
>>
>> a) unknown words are just beeing dropped
>> b) words with slashes are interpreted as file paths and the first path
>> is beeing dropped.
>>
>> any idea how we can fix this?
>>
>> jodok
>>
>>
>
>        Regards,
>                Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83



--
Jodok Batlogg, Vorstand

Lovely Systems AG
Telefon +43 5572 908060, Fax +43 5572 908060-77, Mobil +43 664 9636963
Schmelzhütterstraße 26a, 6850 Dornbirn, Austria

Sitz: Dornbirn, FB: Landesgericht Feldkirch, FN: 208859x, UID: ATU51736705
Aufsichtsratsvorsitzender: Christian Lutz
Vorstand: Jodok Batlogg, Manfred Schwendinger

pgsql-general by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: tsearch2 problem
Next
From: Ivan Sergio Borgonovo
Date:
Subject: Re: tsearch2 problem