Re: fulltext search and hunspell - Mailing list pgsql-general
From | Jens Sauer |
---|---|
Subject | Re: fulltext search and hunspell |
Date | |
Msg-id | 4D540F61.9070701@googlemail.com Whole thread Raw |
In response to | Re: fulltext search and hunspell (Oleg Bartunov <oleg@sai.msu.su>) |
List | pgsql-general |
Thanks for this tip, the german compound directory from http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ works fine. I think the problem was the rudimentary support of hunspell dictionaries. Thanks for your help and your great software! Am 08.02.2011 11:34, schrieb Oleg Bartunov: > Jens, > > have you tried german compound dictionary from > http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ > > Oleg > On Tue, 8 Feb 2011, Jens Sauer wrote: > >> Hey, >> >> thanks for your answer. >> >> First I checked the links in the tsearch_data directory >> de_de.affix, and de_de.dict are symlinks to the corresponding files in >> /var/cache/postgresql/dicts/ >> Then I recreated them by using pg_updatedicts. >> >> This is an extract of the de_de.affix file: >> >> # this is the affix file of the de_DE Hunspell dictionary >> # derived from the igerman98 dictionary >> # >> # Version: 20091006 (build 20100127) >> # >> # Copyright (C) 1998-2009 Bjoern Jacke <bjoern@j3e.de> >> # >> # License: GPLv2, GPLv3 or OASIS distribution license agreement >> # There should be a copy of both of this licenses included >> # with every distribution of this dictionary. Modified >> # versions using the GPL may only include the GPL >> >> SET ISO8859-1 >> TRY esijanrtolcdugmphbyfvkwqxz??????????ESIJANRTOLCDUGMPHBYFVKWQXZ????-. >> >> PFX U Y 1 >> PFX U 0 un . >> >> PFX V Y 1 >> PFX V 0 ver . >> >> SFX F Y 35 >> [...] >> >> I cannot find "compoundwords controlled z" there, so I manually added >> it. >> >> [...] >> # versions using the GPL may only include the GPL >> >> compoundwords controlled z >> >> SET ISO8859-1 >> TRY esijanrtolcdugmphbyfvkwqxz??????????ESIJANRTOLCDUGMPHBYFVKWQXZ????-. >> [...] >> >> Then I restarted PostgreSQL. >> >> Now I get an error: >> SELECT * FROM ts_debug('Schokoladenfabrik'); >> FEHLER: falsches Affixdateiformat f?r Flag >> CONTEXT: Zeile 18 in Konfigurationsdatei >> ?/usr/share/postgresql/8.4/tsearch_data/de_de.affix?: ?PFX U Y 1 >> ? >> SQL-Funktion ?ts_debug? Anweisung 1 >> SQL-Funktion ?ts_debug? Anweisung 1 >> >> Which means: >> ERROR: wrong Affixfileformat for flag >> CONTEXT: Line 18 in Configuration ... >> >> If I add >> COMPOUNDFLAG Z >> ONLYINCOMPOUND L >> >> instead of "compoundwords controlled z" >> >> I didn't get an error: >> >> SELECT * FROM ts_debug('Schokoladenfabrik'); >> alias | description | token | >> dictionaries | dictionary | lexemes >> -----------+-----------------+-------------------+-------------------------------+-------------+------------------- >> >> asciiword | Word, all ASCII | Schokoladenfabrik | >> {german_hunspell,german_stem} | german_stem | {schokoladenfabr} >> (1 row) >> >> But it seems that the hunspell dictionary is not working for compound >> words. >> >> Maybe pg_updatedicts has a bug and generates affix files in the wrong >> format? >> >> Jens >> >> 2011/2/7 Oleg Bartunov <oleg@sai.msu.su>: >>> Jens, >>> >>> could you check affix file for >>> compoundwords controlled z >>> >>> also, can you provide link to dictionary files, so we can check if they >>> supported, since we have only rudiment support of hunspell. >>> btw,it'd be nice to have output from ts_debug() to make sure >>> dictionaries >>> actually used. >>> >>> Oleg >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: