Hi!
Hi!
Oleg, what exactly do you mean by "tsearch2 doesn't support unicode yet"?
It does seem to work fine in my database, it seems:
./pg_controldata [mycluster] gives me
pg_control version number: 72
[...]
LC_COLLATE: de_DE.UTF-8
LC_CTYPE: de_DE.UTF-8
community_unicode=# SELECT pg_encoding_to_char(encoding) AS encoding FROM pg_database WHERE
datname='community_unicode';
encoding
----------
UNICODE
(1 row)
community_unicode=# select to_tsvector('default_german', 'Ich fände, daß das Fehlen von Umlauten ein Ärgernis wäre.');
to_tsvector
------------------------------------------------------------------
'daß':3 'wäre':10 'fehlen':5 'fände':2 'umlauten':7 'Ärgernis':9
(1 row)
community_unicode=# SELECT message_id
community_unicode-# , rank(idxfti, to_tsquery('default_german', 'Könige|Söldner'),0) as rank
community_unicode-# FROM ct_com_board_message
community_unicode-# WHERE idxfti @@ to_tsquery('default_german', 'Könige|Söldner')
community_unicode-# order by rank desc
community_unicode-# limit 10;
message_id | rank
------------+----------
3191632 | 0.686189
2803233 | 0.686189
2935325 | 0.686189
2882337 | 0.686189
2842006 | 0.686189
2854329 | 0.686189
2841962 | 0.686189
2999851 | 0.651322
2869839 | 0.651322
2999799 | 0.61258
(10 rows)
These results look alright to me, so I cannot reproduce this phenomenon of disappearing special characters in a
unicode-database.Dawid, are you sure, you INITDB'd your cluster to the correct locale-settings?
Kind regards
Markus
> -----Ursprüngliche Nachricht-----
> Von: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] Im Auftrag von
> Oleg Bartunov
> Gesendet: Mittwoch, 17. November 2004 17:32
> An: Dawid Kuroczko
> Cc: Pgsql General
> Betreff: Re: [GENERAL] Tsearch2 and Unicode?
>
> Dawid,
>
> unfortunately, tsearch2 doesn't support unicode yet.
> If you keep tsvector separately from data than you'll need
> one more join.
>
> Oleg
>