<![CDATA[select 5 & 6 <yahoo!>]]> select 5 & 6 <yahoo!>
Either form of result is correct, and having it respect the form that was used in the input might even be delightfully smart.
I haven't looked in the code just now to see if it is intentionally being delightfully smart, or more simplistic-and-lucky.
It appears to be probably unintentional-but-ok: libxml tags a CDATA section differently (XML_CDATA_SECTION_NODE) than a text node (XML_TEXT_NODE), so a CDATA node falls into the catch-all branch of
if (cur->type != XML_ATTRIBUTE_NODE && cur->type != XML_TEXT_NODE)
This was intentional. Earlier versions of the patch had the CDATA explicitly listed[0]. I suggested to reverse the logic later [1].
My concern at that time was about xmltable but I think the current behavior is fine for xpath too. If you pick fragment(s) out of an XML document the most reasonable thing to do is to return that part(s) of the XML as it actually appears in the input.
We could choose to say that's what we meant it to do all along.