Re: Fix XML handling with DOCTYPE - Mailing list pgsql-hackers

From Chapman Flack
Subject Re: Fix XML handling with DOCTYPE
Date
Msg-id 5C8E8E2F.9050600@anastigmatix.net
Whole thread Raw
In response to Re: Fix XML handling with DOCTYPE  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Fix XML handling with DOCTYPE
List pgsql-hackers
On 03/17/19 13:16, Tom Lane wrote:
> Chapman Flack <chap@anastigmatix.net> writes:
>> What I was doing in the patch is the reverse: parsing with the expectation
>> of CONTENT to see if a DTD gets tripped over. It isn't allowed for an
>> element to precede a DTD, so that approach can be expected to fail fast
>> if the other branch needs to be taken.
> 
> Ah, right.  I don't have any problem with trying the CONTENT approach
> before the DOCUMENT approach rather than vice-versa.  What I was concerned
> about was adding a lot of assumptions about exactly how libxml would
> report the failure.  IMO a maximally-safe patch would just rearrange
> things we're already doing without adding new things.
> 
>> But a quick pre-scan for the same thing would have the same property,
>> without the libxml dependencies that bother you here. Watch this space.
> 
> Do we need a pre-scan at all?

Without it, we double the time to a failure result in every case that
should actually fail, as well as in this one corner case that we want to
see succeed, and the question you posed earlier about which error message
to return becomes thornier.

If the query asked for CONTENT, any error result should be one you could get
when parsing as CONTENT. If we switch and try parsing as DOCUMENT _because
the input is claiming to have the form of a DOCUMENT_, then it's defensible
to return errors explaining why it's not a DOCUMENT ... but not in the
general case of just throwing DOCUMENT at it any time CONTENT parse fails.

Regards,
-Chap


pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: Offline enabling/disabling of data checksums
Next
From: "Jonathan S. Katz"
Date:
Subject: Re: jsonpath