Re: BUG #4622: xpath only work in utf-8 server encoding - Mailing list pgsql-bugs

From eshkinkot
Subject Re: BUG #4622: xpath only work in utf-8 server encoding
Date
Msg-id 9ea8622b0902072142u76c86c30q8b433182e8cb0800@mail.gmail.com
Whole thread Raw
In response to Re: BUG #4622: xpath only work in utf-8 server encoding  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-bugs
23 января 2009 г. 0:58 пользователь Peter Eisentraut <peter_e@gmx.net> написал:
> On Thursday 22 January 2009 15:39:00 Sergey Burladyan wrote:
>> seb=# select xpath('/русский/text()', v::xml) from (select
>> xml('<русский>язык</русский>')) as x(v);
>> ERROR:  could not parse XML data
>> DETAIL:  Entity: line 1: parser error : Input is not proper UTF-8, indicate
>> encoding !
>> Bytes: 0xF0 0xF3 0xF1 0xF1
>> <x><русский>язык</русский></x>
>>     ^

> This raises the question: What are the rules about encoding the characters in
> XPath expressions themselves?  I haven't found anything about that in the
> standard.  Anyone know?

PostgreSQL does not use libxml2 internal encoding support and strip
xml encoding from xml body, so i think there is no choice, by default
for libxml2 it must be in it internal encoding utf-8 anyway.

i am not sure about xml standard but may be documentation of libxml2
can help to solve this issue ? see http://xmlsoft.org/encoding.html

"What does this mean in practice for the libxml2 user:
* xmlChar, the libxml2 data type is a byte, those bytes must be
assembled as UTF-8 valid strings. The proper way to terminate an
xmlChar * string is simply to append 0 byte, as usual.
* One just need to make sure that when using chars outside the ASCII
set, the values has been properly converted to UTF-8"

I understand this as: all xmlChar strings must be in utf-8 encoding,
no matter what is encoding of xml body

i try to fix this issue for xpath function, see patch in attachment

by the way, contrib/xml2 also have this issue...

Attachment

pgsql-bugs by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: create database warning
Next
From: "Iceman"
Date:
Subject: BUG #4644: Urgent