Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date
Msg-id 1471EE3B-B08E-418B-8D9F-54116812F6D5@phlo.org
Whole thread Raw
In response to Patch Review: Bugfix for XPATH() if text or attribute nodes are selected  (Radosław Smogura <rsmogura@softperience.eu>)
Responses Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
List pgsql-hackers
On Jun29, 2011, at 19:34 , Radosław Smogura wrote:
> B. 6. Current behaviour _is intended_ (there is "if"  to check node type) and _"natural"_. In this particular case
userask for text content of some node, and this content is actually "<". 

I don't buy that. The check for the node type is there because
two different libxml functions are used to convert nodes to
strings. The if has absolutely *zero* to do with escaping, expect
for that missing escape_xml() call in the "else" case.

Secondly, there is little point in having an type XML if we
don't actually ensure that values of that type can only contain
well-formed XML.

> B. 7. Similar behaviour may be observer e. g. in SQL Server(tm)
> SELECT x.value('(//text())[1]', 'varchar(256)') FROM #xml_t;
> Produced "<"

Whats the *type* of that value? I'm not familiar with the XPATH
support in SQL Server, but since you pass 'varchar(256)' as
the second parameter, I assume that this is how you tell it
the return type you want. Since you chose a textual type, not
quoting the value is perfectly fine. I suggest that you try
this again, but this time evaluate the XPATH expression so
that you get a value of type XML (Assuming such a type exists
in SQL Server)

> B. 8. Even if current behaviour may be treated as wrong it was exposed and other may depends on not escaped content.

Granted, there's the possibility of breaking existing applications
here. But the same argument could have been applied when we made
check against invalid characters (e.g. invalid UTF-8 byte sequences)
tighter for textual types.

When it comes to data integrity (and non-well-formed values in
XML columns are a data integrity issue), I believe that the advantages
of tighter checks out-weight potential compatibility problems.

> B. 9. I think, correct approach should go to create new function (basing on one existing) that will be able to escape
above.In this situation 
> call should look like (for example only!):
> SELECT xml_escape((XPATH('/*/text()', '<root><</root>')))[1]
> or
> SELECT xml_escape((XPATH('/*/text()', '<root><</root>'))[1])
> One method to operate on array one to operate on single XML datum.
> Or to think about something like xmltext(). (Compare current status of xml.c)

-1. Again, value of type XML should always be well-formed XML.
Requiring every future user of posgres XML support to remember
to surround XPATH() calls with XML_ESCAPE() will undoubtedly lead
to bugs.

I'm all for having escaping and un-escaping functions though,
but these *need* to have the signatures
 XML_ESCAPE(text) RETURNS xml XML_UNESCAPE(XML) RETURNS text

best regards,
Florian Pflug

PS: Next time, please post your Review as a follow-up of the mail
that contains the patch. That makes sure that people who already
weighted in on the issue don't overlook your review. Or, the
patch author, for that matter - I nearly missed your Review between
the larger number of mail in my postgres folder.



pgsql-hackers by date:

Previous
From: Florian Pflug
Date:
Subject: Re: Review of patch Bugfix for XPATH() if expression returns a scalar value
Next
From: Alvaro Herrera
Date:
Subject: Re: default privileges wording