BUG #4622: xpath only work in utf-8 server encoding - Mailing list pgsql-bugs

From Sergey Burladyan
Subject BUG #4622: xpath only work in utf-8 server encoding
Date
Msg-id 200901221339.n0MDd0dE033542@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #4622: xpath only work in utf-8 server encoding
List pgsql-bugs
The following bug has been logged online:

Bug reference:      4622
Logged by:          Sergey Burladyan
Email address:      eshkinkot@gmail.com
PostgreSQL version: 8.3.5
Operating system:   Debian testing
Description:        xpath only work in utf-8 server encoding
Details:

hello, all !

i am trying for test parse xml string in other than utf-8 encoding, it
correctly loaded but xpath(text, xml) can't handle it:

seb@seb:~/tmp/pg$ echo $LANG
ru_RU.CP1251
seb@seb:~/tmp/pg$ /usr/lib/postgresql/8.3/bin/postgres -p 5433 -k s -s -D .
LOG:  система была отключена: 2009-01-22 16:30:07 MSK
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

seb@seb:~$ echo $LANG
ru_RU.CP1251
seb@seb:~$ psql -h localhost -p 5433
Welcome to psql 8.3.5, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

seb=# select * from (select
xml('<русский>язык</русский>')) as x(v);
            v
-------------------------
 <русский>язык</русский>
(1 запись)

seb=# select xpath('/русский/text()', v::xml) from (select
xml('<русский>язык</русский>')) as x(v);
ERROR:  could not parse XML data
DETAIL:  Entity: line 1: parser error : Input is not proper UTF-8, indicate
encoding !
Bytes: 0xF0 0xF3 0xF1 0xF1
<x><русский>язык</русский></x>
    ^
seb=# select name, setting from pg_settings where name like 'lc_%' or name
like '%enco%';
      name       |   setting
-----------------+--------------
 client_encoding | WIN1251
 lc_collate      | ru_RU.CP1251
 lc_ctype        | ru_RU.CP1251
 lc_messages     | ru_RU.CP1251
 lc_monetary     | ru_RU.CP1251
 lc_numeric      | ru_RU.CP1251
 lc_time         | ru_RU.CP1251
 server_encoding | WIN1251
(8 rows)

in utf-8 server encoding it work correctly:

seb=> select xpath('/русский/text()', v::xml) from (select
xml('<русский>язык</русский>')) as x(v);
 xpath
--------
 {язык}
(1 запись)

seb=> select name, setting from pg_settings where name like 'lc_%' or name
like '%enco%';
      name       |   setting
-----------------+-------------
 client_encoding | UTF8
 lc_collate      | ru_RU.UTF-8
 lc_ctype        | ru_RU.UTF-8
 lc_messages     | ru_RU.UTF-8
 lc_monetary     | ru_RU.UTF-8
 lc_numeric      | ru_RU.UTF-8
 lc_time         | ru_RU.UTF-8
 server_encoding | UTF8
(8 rows)

i am think something is wrong here, string parsed correctly by xml(text),
but it result can't pass to xpath function...

pgsql-bugs by date:

Previous
From: Michael Meskes
Date:
Subject: Re: segmentation fault on Dynamic query using C
Next
From: Peter Eisentraut
Date:
Subject: Re: BUG #4622: xpath only work in utf-8 server encoding