Re: BUG #16046: xpath returns CDATA tag along with the value inpostgres 12 - Mailing list pgsql-bugs

From Chapman Flack
Subject Re: BUG #16046: xpath returns CDATA tag along with the value inpostgres 12
Date
Msg-id 5DB23068.3080601@anastigmatix.net
Whole thread Raw
In response to Re: BUG #16046: xpath returns CDATA tag along with the value in postgres 12  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #16046: xpath returns CDATA tag along with the value inpostgres 12
Re: BUG #16046: xpath returns CDATA tag along with the value inpostgres 12
List pgsql-bugs
On 10/24/19 17:38, Tom Lane wrote:
>> Query: SELECT unnest(xpath('//cname/aname/text()','<cname><aname><![CDATA[select 5]]></aname></cname>'::xml))
>>
>> Output - pg11: select 5 
>>
>> Output - pg12: <![CDATA[select 5]]>
> 
> ... What's not entirely clear to me is whether it's an intentional
> effect, or a bug.  Authors, any comments?

Hmm. I would say the pg12 behavior is "not wrong". But it's unexpected.
xpath's return type is xml (well, array of), so the result must have a
form that can escape any characters mistakable for markup. In this example,
there aren't any, but once tweaked so there are:

SELECT
 unnest(xpath('//cname/aname/text()',
  '<cname><aname><![CDATA[select 5 & 6 <yahoo!>]]></aname></cname>'::xml));

pg12: <![CDATA[select 5 & 6 <yahoo!>]]>

the necessity is clear.

The other valid option would be to return, not CDATA, but a regular
text node, which would look like straight text if there were no special
characters in it, and would otherwise have every such character individually
escaped. That's what I get from pg11:

SELECT
 unnest(xpath('//cname/aname/text()',
  '<cname><aname><![CDATA[select 5 & 6 <yahoo!>]]></aname></cname>'::xml));

pg11: select 5 & 6 <yahoo!>

So what pg11 is doing is also "not wrong" (in this respect, anyway).
And looks "more natural", in the case where the value has no characters
that need escaping.

Which may or may not be a good thing. Perhaps it could lead the unwary
in some cases to think such a query is giving a directly usable
text string back, which will be harmless until the one time a value
with escaping comes back. (The no-surprises way to get back a directly
usable text string, if that's what's wanted, would be with XMLTABLE
and an output column of text type.)

Oddly, what pg12 is doing seems to be influenced by the form of escaping
used in the input:

SELECT
 unnest(xpath('//cname/aname/text()',
  '<cname><aname><![CDATA[select 5 & 6 <yahoo!>]]></aname></cname>'::xml));
              unnest
-----------------------------------
 <![CDATA[select 5 & 6 <yahoo!>]]>

SELECT
 unnest(xpath('//cname/aname/text()',
  '<cname><aname>select 5 & 6 <yahoo!></aname></cname>'::xml));
             unnest
---------------------------------
 select 5 & 6 <yahoo!>


Either form of result is correct, and having it respect the form that was
used in the input might even be delightfully smart.

I haven't looked in the code just now to see if it is intentionally being
delightfully smart, or more simplistic-and-lucky.

Regards,
-Chap



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #16046: xpath returns CDATA tag along with the value in postgres 12
Next
From: Chapman Flack
Date:
Subject: Re: BUG #16046: xpath returns CDATA tag along with the value inpostgres 12