Thread: Proposal: XML helper functions
One of the problem with shredding XML is that it is very kludgy to get a scalar value back from xpath. The xpath function always returns an array of XML. So for example, to extract a numeric value you need to: 1) use xpath to get the node 2) get the first element of the XML array 3) cast that to varchar 4) cast that to numeric So I wrote the following function: CREATE OR REPLACE FUNCTION xmlvalue( VARCHAR, XML ) RETURNS TEXT AS $$ SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$' THEN (xpath($1, $2))[1] WHEN $1 ~* '/text()$' THEN (xpath($1, $2))[1] WHEN $1 LIKE '%/' THEN (xpath($1 || 'text()', $2))[1] ELSE (xpath($1 || '/text()', $2))[1] END::text; $$ LANGUAGE 'sql' IMMUTABLE; It's pretty simple. It just does a check to see if you are extracting an attribute or an element and if element, it makes sure to get the text value. So query that used to look like: SELECT CAST( CAST( (xpath('/foo/bar/text()', myxml))[1] AS varchar) AS numeric) AS bar now becomes: SELECT xmlvalue('/foo/bar/text(), myxml)::numeric AS bar Second function just checks that the xpath expression finds at least one node. CREATE OR REPLACE FUNCTION xmlexists( VARCHAR, XML ) RETURNS BOOLEAN AS $$ SELECT CASE WHEN array_upper(xpath($1, $2), 1) > 0 THEN true ELSE false END; $$ LANGUAGE 'sql' IMMUTABLE; On naming, SQL/XML specifies xmlexists and xmlcast. Latest db2 provides xmlcast(), Oracle has equivalent extractvalue() function, MSSQL uses xml.value(). The xmlvalue does only part of what is required by xmlcast (it won't cast scalar to xml). So would these functions need to be rewritten in c in order to be accepted? Regards, Scott Bailey Further reading: http://scottrbailey.wordpress.com/2009/06/19/xml-parsing-postgres/ http://en.wikibooks.org/wiki/SQL_Dialects_Reference/SQL_XML http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions052.htm#i1131042 http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/r0023486.htm
On Tue, Jan 5, 2010 at 1:14 PM, Scott Bailey <artacus@comcast.net> wrote: > One of the problem with shredding XML is that it is very kludgy to get a > scalar value back from xpath. The xpath function always returns an array of > XML. So for example, to extract a numeric value you need to: > 1) use xpath to get the node > 2) get the first element of the XML array > 3) cast that to varchar > 4) cast that to numeric I just happen to be dealing with XML right now as well and my initial thought is that your suggestion doesn't buy you a whole lot: the root problem IMO is not dealing with what xpath gives you but that there is no DOMish representation of the xml document for you to query. You have to continually feed the entire document to xpath which is absolutely not scalable (if it works the way I think it does -- haven't looked at the code). xpath is great for simple things but it's too texty and you need a more robust API to handle documents for serious parsing on the backend. In the short term i'd advise doing work in another pl like perl. merlin
2010/1/5 Scott Bailey <artacus@comcast.net>: > One of the problem with shredding XML is that it is very kludgy to get a > scalar value back from xpath. The xpath function always returns an array of > XML. So for example, to extract a numeric value you need to: > 1) use xpath to get the node > 2) get the first element of the XML array > 3) cast that to varchar > 4) cast that to numeric > > So I wrote the following function: > > CREATE OR REPLACE FUNCTION xmlvalue( > VARCHAR, > XML > ) RETURNS TEXT AS > $$ > SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$' > THEN (xpath($1, $2))[1] > WHEN $1 ~* '/text()$' > THEN (xpath($1, $2))[1] > WHEN $1 LIKE '%/' > THEN (xpath($1 || 'text()', $2))[1] > ELSE (xpath($1 || '/text()', $2))[1] > END::text; > $$ LANGUAGE 'sql' IMMUTABLE; > > It's pretty simple. It just does a check to see if you are extracting an > attribute or an element and if element, it makes sure to get the text value. > > So query that used to look like: > > SELECT CAST( > CAST( > (xpath('/foo/bar/text()', myxml))[1] > AS varchar) > AS numeric) AS bar > > now becomes: > > SELECT xmlvalue('/foo/bar/text(), myxml)::numeric AS bar > > > Second function just checks that the xpath expression finds at least one > node. > > CREATE OR REPLACE FUNCTION xmlexists( > VARCHAR, > XML > ) RETURNS BOOLEAN AS > $$ > SELECT CASE WHEN array_upper(xpath($1, $2), 1) > 0 > THEN true ELSE false END; > $$ LANGUAGE 'sql' IMMUTABLE; > > On naming, SQL/XML specifies xmlexists and xmlcast. I am for SQL/XML naming convention. Regards Pavel Stehule Latest db2 provides > xmlcast(), Oracle has equivalent extractvalue() function, MSSQL uses > xml.value(). The xmlvalue does only part of what is required by xmlcast (it > won't cast scalar to xml). > > So would these functions need to be rewritten in c in order to be accepted? > > Regards, > > Scott Bailey > > > Further reading: > > http://scottrbailey.wordpress.com/2009/06/19/xml-parsing-postgres/ > http://en.wikibooks.org/wiki/SQL_Dialects_Reference/SQL_XML > http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions052.htm#i1131042 > http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/r0023486.htm > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Merlin Moncure wrote: > On Tue, Jan 5, 2010 at 1:14 PM, Scott Bailey <artacus@comcast.net> wrote: >> One of the problem with shredding XML is that it is very kludgy to get a >> scalar value back from xpath. The xpath function always returns an array of >> XML. So for example, to extract a numeric value you need to: >> 1) use xpath to get the node >> 2) get the first element of the XML array >> 3) cast that to varchar >> 4) cast that to numeric > > I just happen to be dealing with XML right now as well and my initial > thought is that your suggestion doesn't buy you a whole lot: the root > problem IMO is not dealing with what xpath gives you but that there is > no DOMish representation of the xml document for you to query. You > have to continually feed the entire document to xpath which is > absolutely not scalable (if it works the way I think it does -- > haven't looked at the code). No typically you'll only be passing the xml for a single "row" so what we end up doing in Postgres typically looks something like this: SELECT xmlvalue('/row/@id', bitesizexml)::int AS id, xmlvalue('/row/@lat', bitesizexml)::numeric AS lat, xmlvalue('/row/@lon',bitesizexml)::numeric, xmlvalue('/row/comment', bitesizexml) AS cmt FROM ( SELECT unnest(xpath('/foo/row', mybigxmldoc)) AS bitesizexml ) sub So only the one call has to work with the entire document. All the calls to xmlvalue are passed a much smaller node to work with. > xpath is great for simple things but it's too texty and you need a> more robust API to handle documents for serious parsingon the> backend. In the short term i'd advise doing work in another pl like> perl. This is basically the method used for Oracle too until they provided XMLTable functionality. They had a function xmlsequence that basically did the unnest(xpath()) part. Hopefully we'll get xmltable support soon. Scott
On tis, 2010-01-05 at 10:14 -0800, Scott Bailey wrote: > One of the problem with shredding XML is that it is very kludgy to get a > scalar value back from xpath. The xpath function always returns an array > of XML. So for example, to extract a numeric value you need to: > 1) use xpath to get the node > 2) get the first element of the XML array > 3) cast that to varchar > 4) cast that to numeric There has been talk about adding something like xpath_string, xpath_number, xpath_boolean for fetching xpath expressions that don't return nodesets. I think that would fit your use case.
Pavel Stehule wrote: > 2010/1/5 Scott Bailey <artacus@comcast.net>: >> One of the problem with shredding XML is that it is very kludgy to get a >> scalar value back from xpath. The xpath function always returns an array of >> XML. So for example, to extract a numeric value you need to: >> 1) use xpath to get the node >> 2) get the first element of the XML array >> 3) cast that to varchar >> 4) cast that to numeric >> >> So I wrote the following function: >> >> CREATE OR REPLACE FUNCTION xmlvalue( >> VARCHAR, >> XML >> ) RETURNS TEXT AS >> $$ >> SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$' >> THEN (xpath($1, $2))[1] >> WHEN $1 ~* '/text()$' >> THEN (xpath($1, $2))[1] >> WHEN $1 LIKE '%/' >> THEN (xpath($1 || 'text()', $2))[1] >> ELSE (xpath($1 || '/text()', $2))[1] >> END::text; >> $$ LANGUAGE 'sql' IMMUTABLE; >> >> It's pretty simple. It just does a check to see if you are extracting an >> attribute or an element and if element, it makes sure to get the text value. >> >> So query that used to look like: >> >> SELECT CAST( >> CAST( >> (xpath('/foo/bar/text()', myxml))[1] >> AS varchar) >> AS numeric) AS bar >> >> now becomes: >> >> SELECT xmlvalue('/foo/bar/text(), myxml)::numeric AS bar >> >> >> Second function just checks that the xpath expression finds at least one >> node. >> >> CREATE OR REPLACE FUNCTION xmlexists( >> VARCHAR, >> XML >> ) RETURNS BOOLEAN AS >> $$ >> SELECT CASE WHEN array_upper(xpath($1, $2), 1) > 0 >> THEN true ELSE false END; >> $$ LANGUAGE 'sql' IMMUTABLE; >> >> On naming, SQL/XML specifies xmlexists and xmlcast. > > I am for SQL/XML naming convention. Well I'm shying away from the name xmlcast because it is supposed to cast xml to scalar, scalar to xml and xml to xml. For instance these would all work on db2. SELECT xmlcast(null AS XML), xmlcast(bar AS XML), xmlcast(xmlquery('$x/baz/bar' PASSING foo.barxml AS x) AS VARCHAR(30)) FROM foo But I just found that Oracle added xmlcast in 11g and it only does xml to scalar (and only number, varchar2 and date/time types). So maybe you're right. Scott
Peter Eisentraut wrote: > On tis, 2010-01-05 at 10:14 -0800, Scott Bailey wrote: >> One of the problem with shredding XML is that it is very kludgy to get a >> scalar value back from xpath. The xpath function always returns an array >> of XML. So for example, to extract a numeric value you need to: >> 1) use xpath to get the node >> 2) get the first element of the XML array >> 3) cast that to varchar >> 4) cast that to numeric > > There has been talk about adding something like xpath_string, > xpath_number, xpath_boolean for fetching xpath expressions that don't > return nodesets. I think that would fit your use case. The first two sound very much like what I'm looking for. I'm unsure about the third. Is it's purpose to extract the scalar value of an expression and cast to bool as the other two do, or is it to identify if the xpath expression returned any nodes like xmlexists? Scott
On tis, 2010-01-05 at 11:50 -0800, Scott Bailey wrote: > > There has been talk about adding something like xpath_string, > > xpath_number, xpath_boolean for fetching xpath expressions that > don't > > return nodesets. I think that would fit your use case. > > The first two sound very much like what I'm looking for. I'm unsure > about the third. Is it's purpose to extract the scalar value of an > expression and cast to bool as the other two do, or is it to identify > if the xpath expression returned any nodes like xmlexists? XPath as a language has four data types: nodeset, string, number, boolean. So an XPath expression could return any of those types. Then, I suppose, the xpath_foo() function would evaluate the expression and return the result as type foo, possibly raising an error if the types don't match. Details to be determined.
But by using the above code: how do we deal with multiple matching values? For example: <question name=“my_question"> <tag> java </tag> <tag> c++ </tag> </question> In this case, perhaps I would want something like ---------------+----- my_question | java my_question | c++ -- View this message in context: http://postgresql.1045698.n5.nabble.com/Proposal-XML-helper-functions-tp2018975p5497064.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.