On 03/17/19 15:06, Tom Lane wrote:
> The error message issue is indeed a concern, but I don't see why it's
> complicated if you agree that
>
>> If the query asked for CONTENT, any error result should be one you could get
>> when parsing as CONTENT.
>
> That just requires us to save the first error message and be sure to issue
> that one not the DOCUMENT one, no?
I confess I haven't looked hard yet at how to do that. The way errors come
out of libxml is more involved than "here's a message", so there's a choice
of (a) try to copy off that struct in a way that's sure to survive
re-executing the parser, and then use the copy, or (b) generate a message
right away from the structured information and save that, and I guess b
might not be too bad; a might not be too bad, or it might, and slide right
back into the kind of libxml-behavior-assumptions you're wanting to avoid.
Meanwhile, here is a patch on the lines I proposed earlier, with a
pre-check. Any performance hit that it could entail (which I'd really
expect to be de minimis, though I haven't benchmarked) ought to be
compensated by the strlen I changed to strnlen in parse_xml_decl (as
there's really no need to run off and count the whole rest of the input
just to know if 1, 2, 3, or 4 bytes are available to decode a UTF-8 char).
... and, yes, I know that could be an independent patch, and then the
performance effect here should be measured from there. But it was near
what I was doing anyway, so I included it here.
Attaching both still-outstanding patches (this one and docfix) so the
CF app doesn't lose one.
Regards,
-Chap