Re: XPATH vs. server_encoding != UTF-8 - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: XPATH vs. server_encoding != UTF-8
Date
Msg-id 95BAA09D-E242-44D5-89F5-A2D8350A364F@phlo.org
Whole thread Raw
In response to Re: XPATH vs. server_encoding != UTF-8  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: XPATH vs. server_encoding != UTF-8
List pgsql-hackers
On Jul23, 2011, at 22:49 , Peter Eisentraut wrote:

> On lör, 2011-07-23 at 17:49 +0200, Florian Pflug wrote:
>> The current thread about JSON and the ensuing discussion about the
>> XML types' behaviour in non-UTF8 databases made me try out how well
>> XPATH() copes with that situation. The code, at least, looks
>> suspicious - XPATH neither verifies that the server encoding is UTF-8,
>> not does it pass the server encoding on to libxml's xpath functions.
>
> This issue is on the Todo list, and there are some archive links there.

Thanks for the pointer, but I think the discussion there doesn't
really apply here.

First, I didn't suggest (or implement) full support for XPATH() together
with server encodings other than UTF-8. My suggested patch simply
closes a hole in the implementation of the current behaviour. Instead of
relying on libxml to be able to detect that the encoding isn't UTF-8, it
relies on it only to detect that the encoding isn't ASCII. Since supported
server encodings are supersets of ASCII, the latter is trivial.

xml.c also seems to have changed quite a bite since this was last
discussed. Tom Lane argued against the proposed patch on the grounds
that there are many more places in xml.c which pass strings to libxml
without charset conversion. However, looking at it now, it seems that
all XML validation goes through xml_parse(), which actually converts
the XML to UTF-8. Only XPATH contains a separate code path, and chooses
to ignore encoding issues all together.

best regards,
Florian Pflug





pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [COMMITTERS] pgsql: Looks like we can't declare getpeereid on Windows anyway.
Next
From: Jeff Janes
Date:
Subject: Re: pgbench cpu overhead (was Re: lazy vxid locks, v1)