Re: [HACKERS] possible encoding issues with libxml2 functions - Mailing list pgsql-hackers

From Noah Misch
Subject Re: [HACKERS] possible encoding issues with libxml2 functions
Date
Msg-id 20170317032327.GA1993326@tornado.leadboat.com
Whole thread Raw
In response to Re: [HACKERS] possible encoding issues with libxml2 functions  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: possible encoding issues with libxml2 functions  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
On Sun, Mar 12, 2017 at 10:26:33PM +0100, Pavel Stehule wrote:
> 2017-03-12 21:57 GMT+01:00 Noah Misch <noah@leadboat.com>:
> > On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote:
> > > 2017-03-12 0:56 GMT+01:00 Noah Misch <noah@leadboat.com>:
> > Please add a test case.
> 
> It needs a application - currently there is not possibility to import XML
> document via recv API :(

I think xml_in() can create every value that xml_recv() can create; xml_recv()
is just more convenient given diverse source encodings.  If you make your
application store the value into a table, does "pg_dump --inserts" emit code
that reproduces the same value?  If so, you can use that in your test case.
If not, please provide precise instructions (code, SQL commands) for
reproducing the bug manually.

> > Why not use xml_parse() instead of calling xmlCtxtReadMemory() directly?
> > The
> > answer is probably in the archives, because someone understood the problem
> > enough to document "Some XML-related functions may not work at all on
> > non-ASCII data when the server encoding is not UTF-8. This is known to be
> > an
> > issue for xpath() in particular."
> 
> 
> Probably there are two possible issues

Would you research in the archives to confirm?

> 1. what I touched - recv function does encoding to database encoding - but
> document encoding is not updated.

Using xml_parse() would fix that, right?

> 2. there are not possibility to encode from document encoding to database
> encoding.

Both xml_in() and xml_recv() require the value to be representable in the
database encoding, so I don't think this particular problem can remain by the
time we reach an xpath_internal() call.



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: [HACKERS] Re: [COMMITTERS] pgsql: Remove objname/objargs split for referring toobjects
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] logical replication launcher crash on buildfarm