Re: full text search, utf8 - Mailing list pgsql-ru-general

From alexander lunyov
Subject Re: full text search, utf8
Date
Msg-id 4A26650B.400@zato.ru
Whole thread Raw
In response to full text search, utf8  (alexander lunyov <lan@zato.ru>)
Responses Re: full text search, utf8  (eshkinkot@gmail.com (Сергей Бурладян))
List pgsql-ru-general
I can answer in english if you like.

This error happening also when i'm trying to CREATE TEXT SEARCH DICTIONARY:

ports=# CREATE TEXT SEARCH DICTIONARY ruispell (
ports(#     TEMPLATE = ispell,
ports(#     DictFile = russian,
ports(#     AffFile = russian,
ports(#     StopWords = russian
ports(# );
ERROR:  неверная последовательность байт имя кодировки "UTF8": 0xd1
ПОДСКАЗКА:  This error can also happen if the byte sequence does not
match the encoding expected by the server, which is controlled by
"client_encoding".
ports=#

All data in table populated with perl script that read text file in UTF8
and make INSERTs, and i think if there was illegal character, error
would appear after INSERT.


Andrew Boag wrote:
> sorry for English response (I don't have Russian keyboard here)
>
> 0xd1 may be an illegal UTF8 chaacter that was mistakenly allowed into
> the table. Not all libraries (or all versions of postgres) prevent
> illegal UTF8 characters from getting into DB.
>
> We saw similar issues with a 7.4 -> 8.1 postgres data migration.
>
> However, I don't fully understand your select query so there may be
> another cause.
>
> alexander lunyov wrote:
>> Здравствуйте.
>>
>> Имеется freebsd 6.2, postgresql-8.3.1
>>
>> В env:
>>
>> % env | grep UTF
>> LANG=ru_RU.UTF-8
>> MM_CHARSET=UTF-8
>>
>> % psql ports -U pgsql
>> Welcome to psql 8.3.1, the PostgreSQL interactive terminal.
>>
>> Type:  \copyright for distribution terms
>>        \h for help with SQL commands
>>        \? for help with psql commands
>>        \g or terminate with semicolon to execute query
>>        \q to quit
>>
>> ports=# \encoding
>> UTF8
>> ports=# \l
>>         Список баз данных
>>     Имя    | Владелец | Кодировка
>> -----------+----------+-----------
>>  ports     | pgsql    | UTF8
>>  postgres  | pgsql    | UTF8
>>  template0 | pgsql    | UTF8
>>  template1 | pgsql    | UTF8
>> (4 rows)
>>
>> Пробую поискать в таблице, и вот результат:
>>
>> ports=# select name from abonents where to_tsvector(name) @@
>> to_tsquery('s');
>> ERROR:  неверная последовательность байт имя кодировки "UTF8": 0xd1
>> ПОДСКАЗКА:  This error can also happen if the byte sequence does not
>> match the encoding expected by the server, which is controlled by
>> "client_encoding".
>>
>> при этом в конфигурации english работает нормально.
>>
>> # select count(name) from abonents where to_tsvector('english',name)
>> @@ to_tsquery('some');
>>  count
>> -------
>>      6
>> (1 запись)
>>
>> Почему?
>>
>
>


--
С уважением
Александр Лунев
ОАО РТК

pgsql-ru-general by date:

Previous
From: alexander lunyov
Date:
Subject: full text search, utf8
Next
From: eshkinkot@gmail.com (Сергей Бурладян)
Date:
Subject: Re: full text search, utf8