Re: Review of patch Bugfix for XPATH() if expression returns a scalar value - Mailing list pgsql-hackers

From Radosław Smogura
Subject Re: Review of patch Bugfix for XPATH() if expression returns a scalar value
Date
Msg-id fc947491f738a2c6c2232ec1eecfa51a@mail.softperience.eu
Whole thread Raw
In response to Re: Review of patch Bugfix for XPATH() if expression returns a scalar value  (Florian Pflug <fgp@phlo.org>)
Responses Re: Review of patch Bugfix for XPATH() if expression returns a scalar value
List pgsql-hackers
On Wed, 29 Jun 2011 22:26:39 +0200, Florian Pflug wrote:
> On Jun29, 2011, at 19:57 , Radosław Smogura wrote:
>> This is review of patch
>> https://commitfest.postgresql.org/action/patch_view?id=565
>> "Bugfix for XPATH() if expression returns a scalar value"
>>
>> SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) 
>> FROM (SELECT
>> (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES 
>> (XMLELEMENT(name
>> "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as 
>> foo;
>>
>>       xmlelement
>> -------------------------
>> <root sth="&lt;n"/>
>>
>>     It's clearly visible that value from attribute is "<n", not "<". 
>> Every
>> parser will read this as "<n" which is not-natural and will 
>> require form
>> consumer/developer to de-escape this on his side - roughly speaking 
>> this will
>> be reported as serious bug.
>
> There's a problem there, no doubt, but your proposed solution just 
> moves the
> problem around.
>
> Here's your query, reformatted to be more legible
>
> SELECT
>   XMLELEMENT(name root,
>              XMLATTRIBUTES(foo.namespace AS sth)
>   )
> FROM (
>   SELECT
>     (XPATH('namespace-uri(/*)', x))[1] AS namespace
>   FROM (VALUES (XMLELEMENT(name "root",
>                 XMLATTRIBUTES('<n' AS xmlns,
>                               '<v' AS value),'<t')
>   )
> ) v(x)) as foo;
>
> What happens is that the XPATH expression selects the xmlns
> attribute with the value '<n'. To be well-formed xml, that value
> must be escaped, so what is actually returned by the XPATH
> call is '<n'. The XMLATTRIBUTES() function, however, doesn't
> distinguish between input of type XML and input of type TEXT,
> so it figures it has to represent the *textual* value '<n'
> in xml, and produces '&lt;n'.
>
> Not escaping textual values returned by XPATH isn't a solution,
> though. For example, assume someone inserts the value returned
> by the XPATH() call in your query into a column of type XML.
> If XPATH() returned '<n' literally instead of escaping it,
> the column would contain non-well-formed XML in a column of type
> XML. Not pretty, and pretty fatal during your next dump/reload
> cycle.
>
> Or, if you want another example, simply remove the XMLATTRIBUTES
> call and use the value returned by XPATH as a child node directly.
>
> SELECT
>   XMLELEMENT(name root,
>              foo.namespace
>   )
> FROM (
>   SELECT
>     (XPATH('namespace-uri(/*)', x))[1] AS namespace
>   FROM (VALUES (XMLELEMENT(name "root",
>                 XMLATTRIBUTES('<n' AS xmlns,
>                               '<v' AS value),'<t')
>   )
> ) v(x)) as foo;
>
>      xmlelement
> --------------------
>  <root><n</root>
>
> Note that this correct! The <root> node contains a text node
> representing the textual value '<n'. If XPATH() hadn't return
> the value escaped, the query above would have produces
>
>   <root><n</root>
>
> which is obviously wrong. It works in this case because XMLELEMENT
> is smart enough to distinguish between child nodes gives as TEXT
> values (those are escaped) and child nodes provided as XML values
> (those are inserted unchanged).
>
> XMLATTRIBUTES, however, doesn't make that distinction, and simply
> escaped all attributes values independent of their type. Now, the
> reason it does that is probably that *not* all XML values are
> well-formed attribute values without additional escaping. For 
> example,
> you cannot have literal '<' and '>' in attribute values. So if
> XMLATTRIBUTES, like XMLELEMENT never escaped values which are
> already of type XML, the following piece of code
>
>   XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v))
>
> would produce
>
>   <a v="<node/>"/>
>
> which, alas, is not well-formed :-(
>
> The correct solution, I believe, is to teach XMLATTRIBUTES to
> only escape those characters which are invalid in attribute
> values but valid in XML (Probably only '<' and '>' but I'll
> check). If there are no objects, I'll prepare a separate patch
> for that. I don't want to include it in this patch because it's
> really a distinct issue (even modifies a different function).
>
> best regards,
> Florian Pflug
You may manually enter invalid xml too, You don't need xpath for this. Much moreIn PostgreSQL 9.0.1, compiled by Visual
C++build 1500, 64-bitI executedSELECT XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v))and it's looks like I got
correctoutput "<a v="<node/>"/>"when I looked with text editor into table files I saw same value.
 
I will check on last master if I can reproduce this.
But indeed, now PostgreSQL is not type safe against XML type. See SELECT XMLELEMENT(name "root", '<',
(xpath('//text()','<root><</root>'))[1])).
 
You found some problem with escaping, but solution is bad, I think problem lies with XML type, which may be used to
holdstrings and proper xml. For example above query can't distinguish if (xpath('//text()', '<root><</root>'))[1] is
xmltext, or xml element.
 
For me adding support for scalar is really good and needed, but escaping is not.
Regards,Radosław Smogura



pgsql-hackers by date:

Previous
From: Yeb Havinga
Date:
Subject: Re: Parameterized aggregate subquery
Next
From: Hitoshi Harada
Date:
Subject: Re: Parameterized aggregate subquery (was: Pull up aggregate subquery)