Thread: Postgresql8.1.3 tsearch2 with UTF8

Postgresql8.1.3 tsearch2 with UTF8

From
"Raphael Bolfing"
Date:
Hi,

My Task is to update our SuSE8.2 Postgres7.4.1 Webserver with tsearch2 to
the Version SuSE9.3 with Postgres8.1.3 and tsearch2.
The Services are running but i have some  Problems with the
tsearch2
Configuration.



-------------------------------------------------------------------------------------------------------------------------------
old System:
SUSE8.2
Postgresql-7.4.1
tsearch2 (guide: References
on
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2-ref.html
)
In this guide we do the kap. Configuration and Parser

new  System:
SuSE9.3
Postgresql-8.1.3
tsearch2 (2 guides: tsearch2
with
UTF-8)

-------------------------------------------------------------------------------------------------------------------------------


My Steps:
1. I've download the new tsearch2.8.2.tar.gz for UTF-8 and replace the
tsearch2 folder
2. install the tsearch2 with make && make install, without problems
3. locale= de_DE.UTF-8,
4. I've download the *.med *.aff *.stop files from sai.msu.su/
tsearch2_german_utf8.zip  german ispell dictionary (UTF-8)
   extract in /var/lib/ispell/
5. Compiling the German Snowball Stemmer: with stem.c and stem.h (make &&
make install) /dict_de/..
6. After i restored our database with psql -d codasdb -f dump.sql
   and psql -d codasdb -f tsearch2.sql
   and psql -d codasdb -f dict_de.sql
7. I set the dict_initoption='/var/lib/ispell/german.stop' where dict_name
='de'; ???
8. INSERT INTO pg_ts_cfg (ts_name, prs_name, locale) values
('default_german', 'default', 'de_DE.UTF-8');
   INSERT INTO pg_ts_dict (select 'de_ispell',
                                dict_init,
                                'DictFile="/var/lib/ispell/german.med",'
                                'AffFile="/var/lib/ispell/german.aff",'
                                'StopFile="/var/lib/ispell/german.stop"',
                                dict_lexize
                                FROM pg_ts_dict
                                where dict_name ='ispell_template');
9. SELECT set_curdict('de_ispell'); <- doesn't work with de_ispell i set it
('de'); ???

select 'Our first string used today'::tsvector; <-- runs


Now the Problem is:
codasdb=# select to_tsvector('PostgreSQL ist weitgehend konform mit dem
SQL92/SQL99-Standard, d.h. alle in dem Standard geforderten Funktionen
stehen zur Verfuegung und verhalten sich so, wie vom Standard gefordert;
dies ist bei manchen kommerziellen sowie nichtkommerziellen SQL-Datenbanken
bisweilen nicht gegeben.');
ERROR:  invalid UTF-8 byte sequence detected near byte 0xe4


I've testet with
two
guides:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2_german_utf8.html
http://www.tauceti.net/roller/page/cetixx/20060401 (german)



Can
anyone
help?


Raphi






----------------------------------------------------------------------------------------------------------------------------------------------------------
Configuration:

codasdb=# select * from pg_ts_cfg;
     ts_name     | prs_name |    locale
-----------------+----------+--------------
 default         | default  | C
 default_russian | default  | ru_RU.KOI8-R
 utf8_russian    | default  | ru_RU.UTF-8
 simple          | default  |
 default_german  | default  | de_DE.UTF-8


codasdb=# \l
        List of databases
   Name    |  Owner   | Encoding
-----------+----------+----------
 codasdb   | postgres | UTF8
 postgres  | postgres | UTF8
 template0 | postgres | UTF8
 template1 | postgres | UTF8



codasdb=# select * from pg_ts_dict;
    dict_name    |         dict_init          |
                    dict_initoption
        |               dict_lexize               |

dict_comment

-----------------+----------------------------+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------+--------------------------------------------------
 simple          | dex_init(internal)         |

        | dex_lexize(internal,internal,integer)   | Simple example of
dictionary.
 en_stem         | snb_en_init(internal)      | contrib/english.stop

        | snb_lexize(internal,internal,integer)   | English Stemmer.
Snowball.
 ru_stem_koi8    | snb_ru_init_koi8(internal) | contrib/russian.stop

        | snb_lexize(internal,internal,integer)   | Russian Stemmer.
Snowball. KOI8 Encoding
 ru_stem_utf8    | snb_ru_init_utf8(internal) | contrib/russian.stop.utf8

        | snb_lexize(internal,internal,integer)   | Russian Stemmer.
Snowball. UTF8 Encoding
 ispell_template | spell_init(internal)       |

        | spell_lexize(internal,internal,integer) | ISpell interface. Must
have .dict and .aff files
 synonym         | syn_init(internal)         |

        | syn_lexize(internal,internal,integer)   | Example of synonym
dictionary
 de              | dinit_de(internal)         | /var/lib/ispell/german.stop

        | snb_lexize(internal,internal,integer)   | Snowball stemmer for
German
 de_ispell       | spell_init(internal)
|
DictFile="/var/lib/ispell/german.med",AffFile="/var/lib/ispell/german.aff",StopFile="/var/lib/ispell/german.stop"
| spell_lexize(internal,internal,integer) |
(8 rows)

--
GMX Produkte empfehlen und ganz einfach Geld verdienen!
Satte Provisionen f�r GMX Partner: http://www.gmx.net/de/go/partner