Thread: Proposal: XML helper functions

Proposal: XML helper functions

From
Scott Bailey
Date:
One of the problem with shredding XML is that it is very kludgy to get a 
scalar value back from xpath. The xpath function always returns an array 
of XML. So for example, to extract a numeric value you need to:
1) use xpath to get the node
2) get the first element of the XML array
3) cast that to varchar
4) cast that to numeric

So I wrote the following function:

CREATE OR REPLACE FUNCTION xmlvalue(   VARCHAR,   XML
) RETURNS TEXT AS
$$   SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$'   THEN (xpath($1, $2))[1]   WHEN $1 ~* '/text()$'   THEN (xpath($1,
$2))[1]  WHEN $1 LIKE '%/'   THEN (xpath($1 || 'text()', $2))[1]   ELSE (xpath($1 || '/text()', $2))[1]   END::text;
 
$$ LANGUAGE 'sql' IMMUTABLE;

It's pretty simple. It just does a check to see if you are extracting an 
attribute or an element and if element, it makes sure to get the text value.

So query that used to look like:

SELECT CAST(  CAST(    (xpath('/foo/bar/text()', myxml))[1]  AS varchar)
AS numeric) AS bar

now becomes:

SELECT xmlvalue('/foo/bar/text(), myxml)::numeric AS bar


Second function just checks that the xpath expression finds at least one 
node.

CREATE OR REPLACE FUNCTION xmlexists( VARCHAR, XML
) RETURNS BOOLEAN AS
$$ SELECT CASE WHEN array_upper(xpath($1, $2), 1) > 0 THEN true ELSE false END;
$$ LANGUAGE 'sql' IMMUTABLE;

On naming, SQL/XML specifies xmlexists and xmlcast. Latest db2 provides 
xmlcast(), Oracle has equivalent extractvalue() function, MSSQL uses 
xml.value(). The xmlvalue does only part of what is required by xmlcast 
(it won't cast scalar to xml).

So would these functions need to be rewritten in c in order to be accepted?

Regards,

Scott Bailey


Further reading:

http://scottrbailey.wordpress.com/2009/06/19/xml-parsing-postgres/
http://en.wikibooks.org/wiki/SQL_Dialects_Reference/SQL_XML
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions052.htm#i1131042
http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/r0023486.htm


Re: Proposal: XML helper functions

From
Merlin Moncure
Date:
On Tue, Jan 5, 2010 at 1:14 PM, Scott Bailey <artacus@comcast.net> wrote:
> One of the problem with shredding XML is that it is very kludgy to get a
> scalar value back from xpath. The xpath function always returns an array of
> XML. So for example, to extract a numeric value you need to:
> 1) use xpath to get the node
> 2) get the first element of the XML array
> 3) cast that to varchar
> 4) cast that to numeric

I just happen to be dealing with XML right now as well and my initial
thought is that your suggestion doesn't buy you a whole lot: the root
problem IMO is not dealing with what xpath gives you but that there is
no DOMish representation of the xml document for you to query.  You
have to continually feed the entire document to xpath which is
absolutely not scalable (if it works the way I think it does --
haven't looked at the code).

xpath is great for simple things but it's too texty and you need a
more robust API to handle documents for serious parsing on the
backend.  In the short term i'd advise doing work in another pl like
perl.

merlin


Re: Proposal: XML helper functions

From
Pavel Stehule
Date:
2010/1/5 Scott Bailey <artacus@comcast.net>:
> One of the problem with shredding XML is that it is very kludgy to get a
> scalar value back from xpath. The xpath function always returns an array of
> XML. So for example, to extract a numeric value you need to:
> 1) use xpath to get the node
> 2) get the first element of the XML array
> 3) cast that to varchar
> 4) cast that to numeric
>
> So I wrote the following function:
>
> CREATE OR REPLACE FUNCTION xmlvalue(
>   VARCHAR,
>   XML
> ) RETURNS TEXT AS
> $$
>   SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$'
>   THEN (xpath($1, $2))[1]
>   WHEN $1 ~* '/text()$'
>   THEN (xpath($1, $2))[1]
>   WHEN $1 LIKE '%/'
>   THEN (xpath($1 || 'text()', $2))[1]
>   ELSE (xpath($1 || '/text()', $2))[1]
>   END::text;
> $$ LANGUAGE 'sql' IMMUTABLE;
>
> It's pretty simple. It just does a check to see if you are extracting an
> attribute or an element and if element, it makes sure to get the text value.
>
> So query that used to look like:
>
> SELECT CAST(
>  CAST(
>    (xpath('/foo/bar/text()', myxml))[1]
>  AS varchar)
> AS numeric) AS bar
>
> now becomes:
>
> SELECT xmlvalue('/foo/bar/text(), myxml)::numeric AS bar
>
>
> Second function just checks that the xpath expression finds at least one
> node.
>
> CREATE OR REPLACE FUNCTION xmlexists(
>  VARCHAR,
>  XML
> ) RETURNS BOOLEAN AS
> $$
>  SELECT CASE WHEN array_upper(xpath($1, $2), 1) > 0
>  THEN true ELSE false END;
> $$ LANGUAGE 'sql' IMMUTABLE;
>
> On naming, SQL/XML specifies xmlexists and xmlcast.

I am for SQL/XML naming convention.

Regards
Pavel Stehule

Latest db2 provides
> xmlcast(), Oracle has equivalent extractvalue() function, MSSQL uses
> xml.value(). The xmlvalue does only part of what is required by xmlcast (it
> won't cast scalar to xml).
>
> So would these functions need to be rewritten in c in order to be accepted?
>
> Regards,
>
> Scott Bailey
>
>
> Further reading:
>
> http://scottrbailey.wordpress.com/2009/06/19/xml-parsing-postgres/
> http://en.wikibooks.org/wiki/SQL_Dialects_Reference/SQL_XML
> http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions052.htm#i1131042
> http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/r0023486.htm
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: Proposal: XML helper functions

From
Scott Bailey
Date:
Merlin Moncure wrote:
> On Tue, Jan 5, 2010 at 1:14 PM, Scott Bailey <artacus@comcast.net> wrote:
>> One of the problem with shredding XML is that it is very kludgy to get a
>> scalar value back from xpath. The xpath function always returns an array of
>> XML. So for example, to extract a numeric value you need to:
>> 1) use xpath to get the node
>> 2) get the first element of the XML array
>> 3) cast that to varchar
>> 4) cast that to numeric
> 
> I just happen to be dealing with XML right now as well and my initial
> thought is that your suggestion doesn't buy you a whole lot: the root
> problem IMO is not dealing with what xpath gives you but that there is
> no DOMish representation of the xml document for you to query.  You
> have to continually feed the entire document to xpath which is
> absolutely not scalable (if it works the way I think it does --
> haven't looked at the code).

No typically you'll only be passing the xml for a single "row" so what 
we end up doing in Postgres typically looks something like this:

SELECT xmlvalue('/row/@id', bitesizexml)::int AS id,  xmlvalue('/row/@lat', bitesizexml)::numeric AS lat,
xmlvalue('/row/@lon',bitesizexml)::numeric,  xmlvalue('/row/comment', bitesizexml) AS cmt
 
FROM (  SELECT unnest(xpath('/foo/row', mybigxmldoc)) AS bitesizexml
) sub

So only the one call has to work with the entire document. All the calls 
to xmlvalue are passed a much smaller node to work with.

> xpath is great for simple things but it's too texty and you need a> more robust API to handle documents for serious
parsingon the> backend.  In the short term i'd advise doing work in another pl like> perl.
 

This is basically the method used for Oracle too until they provided 
XMLTable functionality. They had a function xmlsequence that basically 
did the unnest(xpath()) part. Hopefully we'll get xmltable support soon.

Scott


Re: Proposal: XML helper functions

From
Peter Eisentraut
Date:
On tis, 2010-01-05 at 10:14 -0800, Scott Bailey wrote:
> One of the problem with shredding XML is that it is very kludgy to get a 
> scalar value back from xpath. The xpath function always returns an array 
> of XML. So for example, to extract a numeric value you need to:
> 1) use xpath to get the node
> 2) get the first element of the XML array
> 3) cast that to varchar
> 4) cast that to numeric

There has been talk about adding something like xpath_string,
xpath_number, xpath_boolean for fetching xpath expressions that don't
return nodesets.  I think that would fit your use case.




Re: Proposal: XML helper functions

From
Scott Bailey
Date:
Pavel Stehule wrote:
> 2010/1/5 Scott Bailey <artacus@comcast.net>:
>> One of the problem with shredding XML is that it is very kludgy to get a
>> scalar value back from xpath. The xpath function always returns an array of
>> XML. So for example, to extract a numeric value you need to:
>> 1) use xpath to get the node
>> 2) get the first element of the XML array
>> 3) cast that to varchar
>> 4) cast that to numeric
>>
>> So I wrote the following function:
>>
>> CREATE OR REPLACE FUNCTION xmlvalue(
>>   VARCHAR,
>>   XML
>> ) RETURNS TEXT AS
>> $$
>>   SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$'
>>   THEN (xpath($1, $2))[1]
>>   WHEN $1 ~* '/text()$'
>>   THEN (xpath($1, $2))[1]
>>   WHEN $1 LIKE '%/'
>>   THEN (xpath($1 || 'text()', $2))[1]
>>   ELSE (xpath($1 || '/text()', $2))[1]
>>   END::text;
>> $$ LANGUAGE 'sql' IMMUTABLE;
>>
>> It's pretty simple. It just does a check to see if you are extracting an
>> attribute or an element and if element, it makes sure to get the text value.
>>
>> So query that used to look like:
>>
>> SELECT CAST(
>>  CAST(
>>    (xpath('/foo/bar/text()', myxml))[1]
>>  AS varchar)
>> AS numeric) AS bar
>>
>> now becomes:
>>
>> SELECT xmlvalue('/foo/bar/text(), myxml)::numeric AS bar
>>
>>
>> Second function just checks that the xpath expression finds at least one
>> node.
>>
>> CREATE OR REPLACE FUNCTION xmlexists(
>>  VARCHAR,
>>  XML
>> ) RETURNS BOOLEAN AS
>> $$
>>  SELECT CASE WHEN array_upper(xpath($1, $2), 1) > 0
>>  THEN true ELSE false END;
>> $$ LANGUAGE 'sql' IMMUTABLE;
>>
>> On naming, SQL/XML specifies xmlexists and xmlcast.
> 
> I am for SQL/XML naming convention.

Well I'm shying away from the name xmlcast because it is supposed to 
cast xml to scalar, scalar to xml and xml to xml. For instance these 
would all work on db2.

SELECT xmlcast(null AS XML),
xmlcast(bar AS XML),
xmlcast(xmlquery('$x/baz/bar' PASSING foo.barxml AS x) AS VARCHAR(30))
FROM foo

But I just found that Oracle added xmlcast in 11g and it only does xml 
to scalar (and only number, varchar2 and date/time types). So maybe 
you're right.

Scott


Re: Proposal: XML helper functions

From
Scott Bailey
Date:
Peter Eisentraut wrote:
> On tis, 2010-01-05 at 10:14 -0800, Scott Bailey wrote:
>> One of the problem with shredding XML is that it is very kludgy to get a 
>> scalar value back from xpath. The xpath function always returns an array 
>> of XML. So for example, to extract a numeric value you need to:
>> 1) use xpath to get the node
>> 2) get the first element of the XML array
>> 3) cast that to varchar
>> 4) cast that to numeric
> 
> There has been talk about adding something like xpath_string,
> xpath_number, xpath_boolean for fetching xpath expressions that don't
> return nodesets.  I think that would fit your use case.

The first two sound very much like what I'm looking for. I'm unsure 
about the third. Is it's purpose to extract the scalar value of an 
expression and cast to bool as the other two do, or is it to identify if 
the xpath expression returned any nodes like xmlexists?

Scott


Re: Proposal: XML helper functions

From
Peter Eisentraut
Date:
On tis, 2010-01-05 at 11:50 -0800, Scott Bailey wrote:
> > There has been talk about adding something like xpath_string,
> > xpath_number, xpath_boolean for fetching xpath expressions that
> don't
> > return nodesets.  I think that would fit your use case.
> 
> The first two sound very much like what I'm looking for. I'm unsure 
> about the third. Is it's purpose to extract the scalar value of an 
> expression and cast to bool as the other two do, or is it to identify
> if the xpath expression returned any nodes like xmlexists?

XPath as a language has four data types: nodeset, string, number,
boolean.  So an XPath expression could return any of those types.  Then,
I suppose, the xpath_foo() function would evaluate the expression and
return the result as type foo, possibly raising an error if the types
don't match.

Details to be determined.



Re: Proposal: XML helper functions

From
Larry
Date:
But by using the above code: how do we deal with multiple matching values?

For example:

<question name=“my_question"> <tag> java </tag> <tag> c++ </tag>
</question>

In this case, perhaps I would want something like

---------------+-----
my_question  | java
my_question  | c++



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Proposal-XML-helper-functions-tp2018975p5497064.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.