Re: Issue: Deprecation of the XML2 module 'xml_is_well_formed' function - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Issue: Deprecation of the XML2 module 'xml_is_well_formed' function
Date
Msg-id AANLkTimwfCMIxIUXKV4sYnaQAbWIoGg0hH41xhuquwI-@mail.gmail.com
Whole thread Raw
In response to Re: Issue: Deprecation of the XML2 module 'xml_is_well_formed' function  (Mike Fowler <mike@mlfowler.com>)
Responses Re: Issue: Deprecation of the XML2 module 'xml_is_well_formed' function
List pgsql-hackers
On Thu, Jul 1, 2010 at 12:25 PM, Mike Fowler <mike@mlfowler.com> wrote:
> Quoting Mike Fowler <mike@mlfowler.com>:
>
>> Should the IS DOCUMENT predicate support this? At the moment you get
>> the following:
>>
>> template1=# SELECT
>>
>> '<towns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns>'
>>  IS
>> DOCUMENT;
>> ?column?
>> ----------
>> t
>> (1 row)
>>
>> template1=# SELECT
>>
>> '<towns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns'
>>  IS
>> DOCUMENT;
>> ERROR:  invalid XML content
>> LINE 1: SELECT '<towns><town>Bidford-on-Avon</town><town>Cwmbran</to...
>>              ^
>> DETAIL:  Entity: line 1: parser error : expected '>'
>>
>> owns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns
>>
>>      ^
>> Entity: line 1: parser error : chunk is not well balanced
>>
>> owns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns
>>
>>      ^
>> I would've hoped the second would've returned 'f' rather than failing.
>> I've had a glance at the XML/SQL standard and I don't see anything in
>> the detail of the predicate (8.2) that would specifically prohibit us
>> from changing this behavior, unless the common rule  'Parsing a string
>> as an XML value' (10.16) must always be in force. I'm no standard
>> expert, but IMHO this would be an acceptable change to improve
>> usability. What do others think?
>
> Right, I've answered my own question whilst sitting in the open source
> coding session at CHAR(10). Yes, IS DOCUMENT should return false for a
> non-well formed document, and indeed is coded to do such. However, the
> conversion to the xml type which happens before the underlying
> xml_is_document function is even called fails and exceptions out. I'll work
> on a patch to resolve this behavior such that IS DOCUMENT will give you the
> missing 'xml_is_well_formed' function.

I think the point if "IS DOCUMENT" is to distinguish a document:

<foo>some stuff<bar/><baz/></foo>

from a document fragment:

<bar/><baz/>

A document is allowed only one toplevel tag.

It'd be nice, I think, to have a function that tells you whether
something is legal XML without throwing an error if it isn't, but I
suspect that should be a separate function, rather than trying to jam
it into "IS DOCUMENT".

http://developer.postgresql.org/pgdocs/postgres/functions-xml.html#AEN15187

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


pgsql-hackers by date:

Previous
From: Guillaume Lelarge
Date:
Subject: Re: Cannot cancel the change of a tablespace
Next
From: uwcssa
Date:
Subject: hello