Postgresql8.1.3 tsearch2 with UTF8 - Mailing list pgsql-admin
From | Raphael Bolfing |
---|---|
Subject | Postgresql8.1.3 tsearch2 with UTF8 |
Date | |
Msg-id | 6399.1147249739@www051.gmx.net Whole thread Raw |
List | pgsql-admin |
Hi, My Task is to update our SuSE8.2 Postgres7.4.1 Webserver with tsearch2 to the Version SuSE9.3 with Postgres8.1.3 and tsearch2. The Services are running but i have some Problems with the tsearch2 Configuration. ------------------------------------------------------------------------------------------------------------------------------- old System: SUSE8.2 Postgresql-7.4.1 tsearch2 (guide: References on http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2-ref.html ) In this guide we do the kap. Configuration and Parser new System: SuSE9.3 Postgresql-8.1.3 tsearch2 (2 guides: tsearch2 with UTF-8) ------------------------------------------------------------------------------------------------------------------------------- My Steps: 1. I've download the new tsearch2.8.2.tar.gz for UTF-8 and replace the tsearch2 folder 2. install the tsearch2 with make && make install, without problems 3. locale= de_DE.UTF-8, 4. I've download the *.med *.aff *.stop files from sai.msu.su/ tsearch2_german_utf8.zip german ispell dictionary (UTF-8) extract in /var/lib/ispell/ 5. Compiling the German Snowball Stemmer: with stem.c and stem.h (make && make install) /dict_de/.. 6. After i restored our database with psql -d codasdb -f dump.sql and psql -d codasdb -f tsearch2.sql and psql -d codasdb -f dict_de.sql 7. I set the dict_initoption='/var/lib/ispell/german.stop' where dict_name ='de'; ??? 8. INSERT INTO pg_ts_cfg (ts_name, prs_name, locale) values ('default_german', 'default', 'de_DE.UTF-8'); INSERT INTO pg_ts_dict (select 'de_ispell', dict_init, 'DictFile="/var/lib/ispell/german.med",' 'AffFile="/var/lib/ispell/german.aff",' 'StopFile="/var/lib/ispell/german.stop"', dict_lexize FROM pg_ts_dict where dict_name ='ispell_template'); 9. SELECT set_curdict('de_ispell'); <- doesn't work with de_ispell i set it ('de'); ??? select 'Our first string used today'::tsvector; <-- runs Now the Problem is: codasdb=# select to_tsvector('PostgreSQL ist weitgehend konform mit dem SQL92/SQL99-Standard, d.h. alle in dem Standard geforderten Funktionen stehen zur Verfuegung und verhalten sich so, wie vom Standard gefordert; dies ist bei manchen kommerziellen sowie nichtkommerziellen SQL-Datenbanken bisweilen nicht gegeben.'); ERROR: invalid UTF-8 byte sequence detected near byte 0xe4 I've testet with two guides: http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2_german_utf8.html http://www.tauceti.net/roller/page/cetixx/20060401 (german) Can anyone help? Raphi ---------------------------------------------------------------------------------------------------------------------------------------------------------- Configuration: codasdb=# select * from pg_ts_cfg; ts_name | prs_name | locale -----------------+----------+-------------- default | default | C default_russian | default | ru_RU.KOI8-R utf8_russian | default | ru_RU.UTF-8 simple | default | default_german | default | de_DE.UTF-8 codasdb=# \l List of databases Name | Owner | Encoding -----------+----------+---------- codasdb | postgres | UTF8 postgres | postgres | UTF8 template0 | postgres | UTF8 template1 | postgres | UTF8 codasdb=# select * from pg_ts_dict; dict_name | dict_init | dict_initoption | dict_lexize | dict_comment -----------------+----------------------------+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------+-------------------------------------------------- simple | dex_init(internal) | | dex_lexize(internal,internal,integer) | Simple example of dictionary. en_stem | snb_en_init(internal) | contrib/english.stop | snb_lexize(internal,internal,integer) | English Stemmer. Snowball. ru_stem_koi8 | snb_ru_init_koi8(internal) | contrib/russian.stop | snb_lexize(internal,internal,integer) | Russian Stemmer. Snowball. KOI8 Encoding ru_stem_utf8 | snb_ru_init_utf8(internal) | contrib/russian.stop.utf8 | snb_lexize(internal,internal,integer) | Russian Stemmer. Snowball. UTF8 Encoding ispell_template | spell_init(internal) | | spell_lexize(internal,internal,integer) | ISpell interface. Must have .dict and .aff files synonym | syn_init(internal) | | syn_lexize(internal,internal,integer) | Example of synonym dictionary de | dinit_de(internal) | /var/lib/ispell/german.stop | snb_lexize(internal,internal,integer) | Snowball stemmer for German de_ispell | spell_init(internal) | DictFile="/var/lib/ispell/german.med",AffFile="/var/lib/ispell/german.aff",StopFile="/var/lib/ispell/german.stop" | spell_lexize(internal,internal,integer) | (8 rows) -- GMX Produkte empfehlen und ganz einfach Geld verdienen! Satte Provisionen f�r GMX Partner: http://www.gmx.net/de/go/partner
pgsql-admin by date: