Re: fulltext search and hunspell - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: fulltext search and hunspell
Date
Msg-id Pine.LNX.4.64.1102081333380.31836@sn.sai.msu.ru
Whole thread Raw
In response to Re: fulltext search and hunspell  (Jens Sauer <jsauer65@googlemail.com>)
Responses Re: fulltext search and hunspell  (Jens Sauer <jsauer65@googlemail.com>)
List pgsql-general
Jens,

have you tried german compound dictionary from
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

Oleg
On Tue, 8 Feb 2011, Jens Sauer wrote:

> Hey,
>
> thanks for your answer.
>
> First I checked the links in the tsearch_data directory
> de_de.affix, and de_de.dict are symlinks to the corresponding files in
> /var/cache/postgresql/dicts/
> Then I recreated them by using pg_updatedicts.
>
> This is an extract of the de_de.affix file:
>
> # this is the affix file of the de_DE Hunspell dictionary
> # derived from the igerman98 dictionary
> #
> # Version: 20091006 (build 20100127)
> #
> # Copyright (C) 1998-2009 Bjoern Jacke <bjoern@j3e.de>
> #
> # License: GPLv2, GPLv3 or OASIS distribution license agreement
> # There should be a copy of both of this licenses included
> # with every distribution of this dictionary. Modified
> # versions using the GPL may only include the GPL
>
> SET ISO8859-1
> TRY esijanrtolcdugmphbyfvkwqxz??????????ESIJANRTOLCDUGMPHBYFVKWQXZ????-.
>
> PFX U Y 1
> PFX U   0     un       .
>
> PFX V Y 1
> PFX V   0     ver      .
>
> SFX F Y 35
> [...]
>
> I cannot find "compoundwords controlled z" there, so I manually added it.
>
> [...]
> # versions using the GPL may only include the GPL
>
> compoundwords  controlled z
>
> SET ISO8859-1
> TRY esijanrtolcdugmphbyfvkwqxz??????????ESIJANRTOLCDUGMPHBYFVKWQXZ????-.
> [...]
>
> Then I restarted PostgreSQL.
>
> Now I get an error:
> SELECT * FROM ts_debug('Schokoladenfabrik');
> FEHLER:  falsches Affixdateiformat f?r Flag
> CONTEXT:  Zeile 18 in Konfigurationsdatei
> ?/usr/share/postgresql/8.4/tsearch_data/de_de.affix?: ?PFX U Y 1
> ?
> SQL-Funktion ?ts_debug? Anweisung 1
> SQL-Funktion ?ts_debug? Anweisung 1
>
> Which means:
> ERROR: wrong Affixfileformat for flag
> CONTEXT: Line 18 in Configuration ...
>
> If I add
> COMPOUNDFLAG Z
> ONLYINCOMPOUND L
>
> instead of "compoundwords  controlled z"
>
> I didn't get an error:
>
> SELECT * FROM ts_debug('Schokoladenfabrik');
>   alias   |   description   |       token       |
> dictionaries          | dictionary  |      lexemes
> -----------+-----------------+-------------------+-------------------------------+-------------+-------------------
> asciiword | Word, all ASCII | Schokoladenfabrik |
> {german_hunspell,german_stem} | german_stem | {schokoladenfabr}
> (1 row)
>
> But it seems that the hunspell dictionary is not working for compound words.
>
> Maybe pg_updatedicts has a bug and generates affix files in the wrong format?
>
> Jens
>
> 2011/2/7 Oleg Bartunov <oleg@sai.msu.su>:
>> Jens,
>>
>> could you check affix file for
>> compoundwords  controlled z
>>
>> also, can you provide link to dictionary files, so we can check if they
>> supported, since we have only rudiment support of hunspell.
>> btw,it'd be nice to have output from ts_debug() to make sure dictionaries
>> actually used.
>>
>> Oleg
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-general by date:

Previous
From: Thom Brown
Date:
Subject: Re: [HACKERS] Issues with generate_series using integer boundaries
Next
From: Michael
Date:
Subject: Displaying text appears as hex data