Re: Another issue with invalid XML values - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: Another issue with invalid XML values
Date
Msg-id 6D5AD292-69E3-478E-B41A-0B2728CAB0CB@phlo.org
Whole thread Raw
In response to Another issue with invalid XML values  (Florian Pflug <fgp@phlo.org>)
Responses Re: Another issue with invalid XML values
List pgsql-hackers
On Jun1, 2011, at 03:17 , Florian Pflug wrote:
> My nagging suspicion is that libxml reports errors like there via some callback function, and only returns a non-zero
resultif there are structural errors in the XML. But my experience with libxml is pretty limited, so maybe someone with
moreexperience in this area can shed some light on this... 

As it turns out, this is actually the case.

libxml reports some errors (like invalid xmlns attributes) via the error handler set using xmlSetGenericErrorFunc() but
stillreturns zero (indicating success) from xmlCtxtReadDoc() and xmlParseBalancedChunkMemory(). 

If I modify xml_parse() to complain not only if one of these functions return non-zero, but also if xml_err_buf has
non-zerolength, invalid xmlns attributes are reported correctly. 

However, the error function set using xmlSetGenericErrorFunc() cannot distinguish between error and warnings, so doing
thiscauses XMLPARSE() to also complain about things like non-absolute namespace URIs (which are allowed but deprecated
asfar as I understand). 

To fix that, xmlSetGenericErrorFunc() would probably have to be replace by xmlSetStructuredErrorFunc(). Structured
errorfunctions receive a pointer to an xmlError structore which, amongst other things, contains an xmlErrorLevel (NONE,
WARNING,ERROR, FATAL). 

While digging through the code in src/backend/utils/adt/xml.c, I also noticed that we set a global error handler
insteadof a per-context one. I guess this is because xmlParseBalancedChunkMemory(), which we use to parse XML
fragments,doesn't provide a way to pass in a context but rather creates it itself. Still, I wonder if there isn't some
otherAPI which we could use which does allow us to specify a context. Again, it'd be nice if someone more familiar with
thiscode could explain the reasons behind the current design. 

Anyway, I'll try to come up with a patch that replaces xmlSetGenericErrorFunc() with xmlSetStructuredErrorFunc().

best regards,
Florian Pflug



pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: patch for new feature: Buffer Cache Hibernation
Next
From: Merlin Moncure
Date:
Subject: Re: PQdeleteTuple function in libpq