Thread: [TextSearch] syntax error while parsing affix file

[TextSearch] syntax error while parsing affix file

From
Daniel Chiaramello
Date:
Hello everybody.

I am using Postrges 8.3.5, and I am trying to install a bulgarian ISpell
dictionary (the OpenOffice one) for Textsearch features.

I converted the dictionary encoding to UTF-8, and I installed it in the
"tsearch_data" folder.

But when I try to create the dictionary, I have a syntax error:

CREATE TEXT SEARCH DICTIONARY bulgarian_ispell (
TEMPLATE = ispell,
DictFile = bulgarian_utf8,
AffFile = bulgarian_utf8,
StopWords = english
);
ERREUR: erreur de syntaxe
CONTEXTE : ligne 24 du fichier de configuration «
/usr/share/pgsql/tsearch_data/bulgarian_utf8.affix » : « . > А
»

(it means ERROR: syntax error, CONTEXT: line 24 of configuration file ...)


Extract of the file arount that line:

flag *A:
. > А (this is line 24)
. > АТА
. > И
. > ИТЕ

The file has Unix end_of_lines (I suspected something like that since
the "CONTEXT" error line was split on 2 lines).

I'm really lost on how I can go further with the bulgarian dictionary...
Could you help me, please?

Thanks for your attention!
Daniel Chiaramello

Re: [TextSearch] syntax error while parsing affix file

From
Tom Lane
Date:
Daniel Chiaramello <daniel.chiaramello@golog.net> writes:
> I am using Postrges 8.3.5, and I am trying to install a bulgarian ISpell
> dictionary (the OpenOffice one) for Textsearch features.

I'm not an expert, but I think our ispell code supports only a subset of
the features that some other implementations have.  So it doesn't
surprise me a lot that some configuration files don't work.  You might
try one of the other sources for ispell files besides openoffice ---
see the links here:
http://developer.postgresql.org/pgdocs/postgres/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY

            regards, tom lane

Re: [TextSearch] syntax error while parsing affix file

From
Teodor Sigaev
Date:
> I am using Postrges 8.3.5, and I am trying to install a bulgarian ISpell
> dictionary (the OpenOffice one) for Textsearch features.

> flag *A:
> . > А (this is line 24)
> . > АТА
> . > И
> . > ИТЕ
OpenOffice or ISpell? Pls, provide:
- link to download of dictionary
- Locale and encoding setting of your db

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: [TextSearch] syntax error while parsing affix file

From
Daniel Chiaramello
Date:
Teodor Sigaev a écrit :
>> I am using Postrges 8.3.5, and I am trying to install a bulgarian
>> ISpell dictionary (the OpenOffice one) for Textsearch features.
>
>> flag *A:
>> . > А (this is line 24)
>> . > АТА
>> . > И
>> . > ИТЕ
> OpenOffice or ISpell? Pls, provide:
> - link to download of dictionary
> - Locale and encoding setting of your db
>
The dictionary is the ISpell one I got from
http://wiki.services.openoffice.org/wiki/Dictionaries list.

Here is a direct link for it:
http://heanet.dl.sourceforge.net/sourceforge/bgoffice/ispell-bg-4.1.tar.gz

I converted its encoding from windows-1251 to UTF-8 before running the
CREATE TEXT SEARCH DICTIONARY:

iconv -f windows-1251 -t utf-8 bulgarian.dic >bulgarian_utf8.dict
iconv -f windows-1251 -t utf-8 bulgarian.aff >bulgarian_utf8.affix

The locale of the database is fr_FR, and its encoding is UTF8.

Thanks!
Daniel

Re: [TextSearch] syntax error while parsing affix file

From
Teodor Sigaev
Date:
> iconv -f windows-1251 -t utf-8 bulgarian.dic >bulgarian_utf8.dict
> iconv -f windows-1251 -t utf-8 bulgarian.aff >bulgarian_utf8.affix
>
> The locale of the database is fr_FR, and its encoding is UTF8.
  I believe that characters 'И', 'А' (non-ascii) and other cyrillic ones are not
acceptable for french locale  :(


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: [TextSearch] syntax error while parsing affix file

From
Daniel Chiaramello
Date:
Teodor Sigaev a écrit :
>> iconv -f windows-1251 -t utf-8 bulgarian.dic >bulgarian_utf8.dict
>> iconv -f windows-1251 -t utf-8 bulgarian.aff >bulgarian_utf8.affix
>>
>> The locale of the database is fr_FR, and its encoding is UTF8.
> I believe that characters 'И', 'А' (non-ascii) and other cyrillic ones
> are not acceptable for french locale :(
>
I was able to install a thailandese dictionary - why would such
dictionary be ok and not a bulgarian one?
Which locale should I use to enable my database to be multi-language
compatible?

I would never have suspected a locale problem... Ouch!

Daniel