Thread: [TextSearch] syntax error while parsing affix file
Hello everybody. I am using Postrges 8.3.5, and I am trying to install a bulgarian ISpell dictionary (the OpenOffice one) for Textsearch features. I converted the dictionary encoding to UTF-8, and I installed it in the "tsearch_data" folder. But when I try to create the dictionary, I have a syntax error: CREATE TEXT SEARCH DICTIONARY bulgarian_ispell ( TEMPLATE = ispell, DictFile = bulgarian_utf8, AffFile = bulgarian_utf8, StopWords = english ); ERREUR: erreur de syntaxe CONTEXTE : ligne 24 du fichier de configuration « /usr/share/pgsql/tsearch_data/bulgarian_utf8.affix » : « . > А » (it means ERROR: syntax error, CONTEXT: line 24 of configuration file ...) Extract of the file arount that line: flag *A: . > А (this is line 24) . > АТА . > И . > ИТЕ The file has Unix end_of_lines (I suspected something like that since the "CONTEXT" error line was split on 2 lines). I'm really lost on how I can go further with the bulgarian dictionary... Could you help me, please? Thanks for your attention! Daniel Chiaramello
Daniel Chiaramello <daniel.chiaramello@golog.net> writes: > I am using Postrges 8.3.5, and I am trying to install a bulgarian ISpell > dictionary (the OpenOffice one) for Textsearch features. I'm not an expert, but I think our ispell code supports only a subset of the features that some other implementations have. So it doesn't surprise me a lot that some configuration files don't work. You might try one of the other sources for ispell files besides openoffice --- see the links here: http://developer.postgresql.org/pgdocs/postgres/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY regards, tom lane
> I am using Postrges 8.3.5, and I am trying to install a bulgarian ISpell > dictionary (the OpenOffice one) for Textsearch features. > flag *A: > . > А (this is line 24) > . > АТА > . > И > . > ИТЕ OpenOffice or ISpell? Pls, provide: - link to download of dictionary - Locale and encoding setting of your db -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev a écrit : >> I am using Postrges 8.3.5, and I am trying to install a bulgarian >> ISpell dictionary (the OpenOffice one) for Textsearch features. > >> flag *A: >> . > А (this is line 24) >> . > АТА >> . > И >> . > ИТЕ > OpenOffice or ISpell? Pls, provide: > - link to download of dictionary > - Locale and encoding setting of your db > The dictionary is the ISpell one I got from http://wiki.services.openoffice.org/wiki/Dictionaries list. Here is a direct link for it: http://heanet.dl.sourceforge.net/sourceforge/bgoffice/ispell-bg-4.1.tar.gz I converted its encoding from windows-1251 to UTF-8 before running the CREATE TEXT SEARCH DICTIONARY: iconv -f windows-1251 -t utf-8 bulgarian.dic >bulgarian_utf8.dict iconv -f windows-1251 -t utf-8 bulgarian.aff >bulgarian_utf8.affix The locale of the database is fr_FR, and its encoding is UTF8. Thanks! Daniel
> iconv -f windows-1251 -t utf-8 bulgarian.dic >bulgarian_utf8.dict > iconv -f windows-1251 -t utf-8 bulgarian.aff >bulgarian_utf8.affix > > The locale of the database is fr_FR, and its encoding is UTF8. I believe that characters 'И', 'А' (non-ascii) and other cyrillic ones are not acceptable for french locale :( -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev a écrit : >> iconv -f windows-1251 -t utf-8 bulgarian.dic >bulgarian_utf8.dict >> iconv -f windows-1251 -t utf-8 bulgarian.aff >bulgarian_utf8.affix >> >> The locale of the database is fr_FR, and its encoding is UTF8. > I believe that characters 'И', 'А' (non-ascii) and other cyrillic ones > are not acceptable for french locale :( > I was able to install a thailandese dictionary - why would such dictionary be ok and not a bulgarian one? Which locale should I use to enable my database to be multi-language compatible? I would never have suspected a locale problem... Ouch! Daniel