Re: XML with invalid chars - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: XML with invalid chars
Date
Msg-id 4DC71857.5070902@dunslane.net
Whole thread Raw
In response to Re: XML with invalid chars  (Noah Misch <noah@leadboat.com>)
Responses Re: XML with invalid chars  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers

On 04/27/2011 11:41 PM, Noah Misch wrote:
> On Wed, Apr 27, 2011 at 11:22:37PM -0400, Andrew Dunstan wrote:
>> On 04/27/2011 05:30 PM, Noah Misch wrote:
>>> To make things worse, the dump/reload problems seems to depend on your version
>>> of libxml2, or something.  With git master, a CentOS 5 system with
>>> 2.6.26-2.1.2.8.el5_5.1 accepts the ^A byte, but an Ubuntu 8.04 LTS system with
>>> 2.6.31.dfsg-2ubuntu rejects it.  Even with a patch like this, systems with a
>>> lenient libxml2 will be liable to store XML data that won't restore on a system
>>> with a strict libxml2.  Perhaps we should emit a build-time warning if the local
>>> libxml2 is lenient?
>> No, I think we need to be strict ourselves.
> Then I suppose we'd also scan for invalid characters in xml_parse()?  Or, at
> least, do so when linked to a libxml2 that neglects to do so itself?

Yep.

>>> Injecting the check here aids "xmlelement" and "xmlforest" , but "xmlcomment"
>>> and "xmlpi" still let the invalid byte through.  You can also still inject the
>>> byte into an attribute value via "xmlelement".  I wonder if it wouldn't make
>>> more sense to just pass any XML that we generate from scratch through libxml2.
>>> There are a lot of holes to plug, otherwise.
>> Maybe there are, but I'd want lots of convincing that we should do that
>> at this stage. Maybe for 9.2. I think we can plug the holes fairly
>> simply for xmlpi and xmlcomment, and catch the attributes by moving this
>> check up into map_sql_value_to_xml_value().
> I don't have much convincing to offer -- hunting down the holes seem fine, too.
>
>

I think I've done that. Here's the patch I have now. It looks like we
can catch pretty much everything by putting checks in four places, which
isn't too bad.

Please review and try to break.

cheers

andrew



Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: patch for new feature: Buffer Cache Hibernation
Next
From: lee Richard
Date:
Subject: Re: Questions about the internal of fastpath function call