Re: fulltext search and hunspell - Mailing list pgsql-general

From Jens Sauer
Subject Re: fulltext search and hunspell
Date
Msg-id 4D540F61.9070701@googlemail.com
Whole thread Raw
In response to Re: fulltext search and hunspell  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-general
Thanks for this tip,
the german compound directory from
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ works fine.
I think the problem was the rudimentary support of hunspell dictionaries.

Thanks for your help and your great software!

Am 08.02.2011 11:34, schrieb Oleg Bartunov:
> Jens,
>
> have you tried german compound dictionary from
> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
>
> Oleg
> On Tue, 8 Feb 2011, Jens Sauer wrote:
>
>> Hey,
>>
>> thanks for your answer.
>>
>> First I checked the links in the tsearch_data directory
>> de_de.affix, and de_de.dict are symlinks to the corresponding files in
>> /var/cache/postgresql/dicts/
>> Then I recreated them by using pg_updatedicts.
>>
>> This is an extract of the de_de.affix file:
>>
>> # this is the affix file of the de_DE Hunspell dictionary
>> # derived from the igerman98 dictionary
>> #
>> # Version: 20091006 (build 20100127)
>> #
>> # Copyright (C) 1998-2009 Bjoern Jacke <bjoern@j3e.de>
>> #
>> # License: GPLv2, GPLv3 or OASIS distribution license agreement
>> # There should be a copy of both of this licenses included
>> # with every distribution of this dictionary. Modified
>> # versions using the GPL may only include the GPL
>>
>> SET ISO8859-1
>> TRY esijanrtolcdugmphbyfvkwqxz??????????ESIJANRTOLCDUGMPHBYFVKWQXZ????-.
>>
>> PFX U Y 1
>> PFX U   0     un       .
>>
>> PFX V Y 1
>> PFX V   0     ver      .
>>
>> SFX F Y 35
>> [...]
>>
>> I cannot find "compoundwords controlled z" there, so I manually added
>> it.
>>
>> [...]
>> # versions using the GPL may only include the GPL
>>
>> compoundwords  controlled z
>>
>> SET ISO8859-1
>> TRY esijanrtolcdugmphbyfvkwqxz??????????ESIJANRTOLCDUGMPHBYFVKWQXZ????-.
>> [...]
>>
>> Then I restarted PostgreSQL.
>>
>> Now I get an error:
>> SELECT * FROM ts_debug('Schokoladenfabrik');
>> FEHLER:  falsches Affixdateiformat f?r Flag
>> CONTEXT:  Zeile 18 in Konfigurationsdatei
>> ?/usr/share/postgresql/8.4/tsearch_data/de_de.affix?: ?PFX U Y 1
>> ?
>> SQL-Funktion ?ts_debug? Anweisung 1
>> SQL-Funktion ?ts_debug? Anweisung 1
>>
>> Which means:
>> ERROR: wrong Affixfileformat for flag
>> CONTEXT: Line 18 in Configuration ...
>>
>> If I add
>> COMPOUNDFLAG Z
>> ONLYINCOMPOUND L
>>
>> instead of "compoundwords  controlled z"
>>
>> I didn't get an error:
>>
>> SELECT * FROM ts_debug('Schokoladenfabrik');
>>   alias   |   description   |       token       |
>> dictionaries          | dictionary  |      lexemes
>> -----------+-----------------+-------------------+-------------------------------+-------------+-------------------
>>
>> asciiword | Word, all ASCII | Schokoladenfabrik |
>> {german_hunspell,german_stem} | german_stem | {schokoladenfabr}
>> (1 row)
>>
>> But it seems that the hunspell dictionary is not working for compound
>> words.
>>
>> Maybe pg_updatedicts has a bug and generates affix files in the wrong
>> format?
>>
>> Jens
>>
>> 2011/2/7 Oleg Bartunov <oleg@sai.msu.su>:
>>> Jens,
>>>
>>> could you check affix file for
>>> compoundwords  controlled z
>>>
>>> also, can you provide link to dictionary files, so we can check if they
>>> supported, since we have only rudiment support of hunspell.
>>> btw,it'd be nice to have output from ts_debug() to make sure
>>> dictionaries
>>> actually used.
>>>
>>> Oleg
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83


pgsql-general by date:

Previous
From: A B
Date:
Subject: Re: Trigger problem, record "new" is not assigned yet
Next
From: Tom Lane
Date:
Subject: Re: 9.0.X FOR UPDATE|SHARE on Sub-Query Causes "cannot extract system attribute from virtual tuple" if Sub-Query Returns Records (BUG)