Re: tsearch2 problem - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: tsearch2 problem |
Date | |
Msg-id | Pine.LNX.4.64.0810311425570.15810@sn.sai.msu.ru Whole thread Raw |
In response to | Re: tsearch2 problem ("Jodok Batlogg" <jodok@lovelysystems.com>) |
List | pgsql-general |
On Fri, 31 Oct 2008, Jodok Batlogg wrote: > hi oleg, > > thanks for your quick response, > > 2008/10/31 Oleg Bartunov <oleg@sai.msu.su>: >> Jodok, >> >> you got what's you defined. Please, read documentation. >> In short, word doesn't indexed if it is not recognized by any >> dictionaried from stack of dictionaries. Put stemming dictionary at the end, >> which recognizes everything. > > can you point me to "the" documentation where i could find that? i > think i tried hard :) well, it's not really hard http://www.postgresql.org/docs/8.3/static/textsearch-dictionaries.html "A text search configuration binds a parser together with a set of dictionaries to process the parser's output tokens. For each token type that the parser can return, a separate list of dictionaries is specified by the configuration. When a token of that type is found by the parser, each dictionary in the list is consulted in turn, until some dictionary recognizes it as a known word. If it is identified as a stop word, or if no dictionary recognizes the token, it will be discarded and not indexed or searched for. The general rule for configuring a list of dictionaries is to place first the most narrow, most specific dictionary, then the more general dictionaries, finishing with a very general dictionary, like a Snowball stemmer or simple, which recognizes everything." > > however - problem a) is fixed. thanks :) > nevertheless i still have the problem that words with '/' are beeing > interpreted as file paths instead of words. any idea how i could tweak > this? several ways: 1. use your own parser 2. use encode/decode functions, which cheat default parser. For example, encodeslash('aa/bb') -> aaxxxxxxbb. But then you should understand, that dictionary like ispell will not be able to recognize it. > > thanks > > jodok > >> >> Oleg >> On Fri, 31 Oct 2008, Jodok Batlogg wrote: >> >>> we're using tsearch2 with the german dictionary >>> >>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-german-compound.tar.gz >>> for fulltext search. >>> >>> the indexing is configured as follows: >>> >>> CREATE TEXT SEARCH DICTIONARY public.german ( >>> TEMPLATE = ispell, >>> DictFile = german, >>> AffFile = german, >>> StopWords = german >>> ); >>> >>> CREATE TEXT SEARCH CONFIGURATION public.default ( COPY = pg_catalog.german >>> ); >>> >>> ALTER TEXT SEARCH CONFIGURATION public.default >>> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, >>> word, hword, hword_part >>> WITH public.german; >>> >>> ------------------------- >>> >>> select * from ts_debug('default', 'hundshЪЪtte'); >>> works as expected: creates the two lexemes: "{hund,hЪЪtte}" >>> >>> BUT >>> >>> SELECT to_tsvector('default','lovely und bauarbeiter/in'); >>> looses a lot of stuff: >>> "'bauarbeiter/in':2" >>> >>> some more debugging shows: >>> >>> SELECT * from ts_debug('default','lovely und bauarbeiter/in'); >>> >>> "asciiword";"Word, all ASCII";"lovely";"{german}";"german";"" >>> "blank";"Space symbols";" ";"{}";"";"" >>> "asciiword";"Word, all ASCII";"und";"{german}";"german";"{}" >>> "blank";"Space symbols";" ";"{}";"";"" >>> "file";"File or path >>> name";"bauarbeiter/in";"{simple}";"simple";"{bauarbeiter/in}" >>> >>> a) unknown words are just beeing dropped >>> b) words with slashes are interpreted as file paths and the first path >>> is beeing dropped. >>> >>> any idea how we can fix this? >>> >>> jodok >>> >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 > > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: