extract text from XML - Mailing list pgsql-hackers

From Chris Pacejo
Subject extract text from XML
Date
Msg-id 1470634239.713742.688783561.46752730@webmail.messagingengine.com
Whole thread Raw
List pgsql-hackers
Hi, I have found a basic use case which is supported by the xml2 module,
but is unsupported by the new XML API.

It is not possible to correctly extract text (either from text nodes or
attribute values) which contains the characters '<', '&', or '>'. 
xpath() (correctly) returns XML text nodes for queries targeting these
node types, and there is no inverse to xmlelement().  For example:

=> select (xpath('/a/text()', xmlelement(name a, '<&>')))[1]::text;  xpath   
-----------<&>
(1 row)

Again, not a bug; but there is no way to specify my desired intent.  The
xml2 module does provide such a function, xpath_string:

=> select xpath_string(xmlelement(name a, '<&>')::text, '/a/text()');xpath_string 
--------------<&>
(1 row)

One workaround is to return the node's text value by serializing the XML
value, and textually replacing those three entities with the characters
they represent, but this relies on the xpath() function not generating
other entities.

(My use case is importing data in XML format, and processing with
Postgres into a relational format.)

Perhaps a function xpath_value(text, xml) -> text[] would close the gap?(I did search and no such function seems to
existcurrently, outside
 
xml2.)

Thanks,
Chris



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: No longer possible to query catalogs for index capabilities?
Next
From: Rahila Syed
Date:
Subject: Re: Surprising behaviour of \set AUTOCOMMIT ON