Re: Using a german affix file for compound words - Mailing list pgsql-general

From Artur Zakirov
Subject Re: Using a german affix file for compound words
Date
Msg-id 56AB2F20.6030604@postgrespro.ru
Whole thread Raw
In response to Re: Using a german affix file for compound words  (Wolfgang Winkler <wolfgang.winkler@digital-concepts.com>)
Responses Re: Using a german affix file for compound words  (Wolfgang Winkler <wolfgang.winkler@digital-concepts.com>)
List pgsql-general
On 28.01.2016 20:36, Wolfgang Winkler wrote:
> I'm using 9.4.5 as well and I used exactly the same iconv lines as you
> postes below.
>
> Are there any encoding options that have to be set right? The database
> encoding is set to UTF8.
>
> ww

What output does the command show:

-> SHOW LC_CTYPE;

?

Did you try a dictionary from
http://extensions.openoffice.org/en/project/german-de-de-frami-dictionaries
?
You need extract from a downloaded archive de_DE_frami.aff and
de_DE_frami.dic files, rename them and convert them to UTF-8.

>
> Am 2016-01-28 um 17:34 schrieb Artur Zakirov:
>> On 28.01.2016 18:57, Oleg Bartunov wrote:
>>>
>>>
>>> On Thu, Jan 28, 2016 at 6:04 PM, Wolfgang Winkler
>>> <wolfgang.winkler@digital-concepts.com
>>> <mailto:wolfgang.winkler@digital-concepts.com>> wrote:
>>>
>>>     Hi!
>>>
>>>     We have a problem with importing a compound dictionary file for
>>> german.
>>>
>>>     I downloaded the files here:
>>>
>>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-german-compound.tar.gz
>>>
>>>     and converted them to utf-8 with iconv. The affix file seems ok when
>>>     opened with an editor.
>>>
>>>     When I try to create or alter a dictionary to use this affix file, I
>>>     get the following error:
>>>
>>>     alter TEXT SEARCH DICTIONARY german_ispell (
>>>        DictFile = german,
>>>        AffFile = german,
>>>        StopWords = german
>>>     );
>>>     ERROR:  syntax error
>>>     CONTEXT:  line 224 of configuration file
>>>     "/usr/local/pgsql/share/tsearch_data/german.affix": "   ABE >
>>> -ABE,äBIN
>>>     "
>>>
>>>     This is the first occurrence of an umlaut character in the file.
>>>     I've found a view postings where the same file is used, e.g.:
>>>
>>> http://www.postgresql.org/message-id/flat/556C1411.4010608@tbz-pariv.de#556C1411.4010608@tbz-pariv.de
>>>
>>>     This users has been able to import the file. Am I missing something
>>>     obvious?
>>>
>>
>> What version of PostgreSQL do you use?
>>
>> I tested this dictionary on PostgreSQL 9.4.5. Downloaded from the link
>> files and executed commands:
>>
>> iconv -f ISO-8859-1 -t UTF-8 german.aff -o german2.affix
>> iconv -f ISO-8859-1 -t UTF-8 german.dict -o german2.dict
>>
>> I renamed them to german.affix and german.dict and moved to the
>> tsearch_data directory. Executed commands without errors:
>>
>> -> create text search dictionary german_ispell (
>>     Template = ispell,
>>     DictFile = german,
>>     AffFile = german,
>>     Stopwords = german
>> );
>> DROP TEXT SEARCH DICTIONARY
>>
>> -> select ts_lexize('german_ispell', 'test');
>>  ts_lexize
>> -----------
>>  {test}
>> (1 row)
>>
>
>
> --
>
> *Wolfgang Winkler*
> Geschäftsführung
> wolfgang.winkler@digital-concepts.com
> mobil +43.699.19971172
>
> dc:*büro*
> digital concepts Novak Winkler OG
> Software & Design
> Landstraße 68, 5. Stock, 4020 Linz
> www.digital-concepts.com <http://www.digital-concepts.com>
> tel +43.732.997117.72
> tel +43.699.1997117.2
>
> Firmenbuchnummer: 192003h
> Firmenbuchgericht: Landesgericht Linz
>
>
>


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


pgsql-general by date:

Previous
From: Sachin Srivastava
Date:
Subject: Re: Postgres 9.4.5 Installation on Centos 7.3
Next
From: Wolfgang Winkler
Date:
Subject: Re: Using a german affix file for compound words