Thread: xpath improvement suggestion
Hi all, Well I had to burn some midnight oil trying to figure out why a construct like SELECT xpath('name()','<a/>'); doesn't give the expected result. Kept getting an empty array: xpath ------------- {} instead of the expected "{a}" BugID 4294 and the TODO item "better handling of XPath data types" pointed in the right direction. whithin src/backend/utils/adt/xml.c in the function xpath the result of the call to xmlXPathCompiledEval is not handled optimally. In fact, the result is assumed to be a nodeset without consulting the ->type member of the result. I've made some minor changes to xml.c to handle some non-nodeset results of xmlXPathCompiledEval. Essentially, the revised code makes an array of all the nodes in the xpathobj result in case this is a nodeset, or an array with a single element in case the reult is a number/string/boolean. The problem cases mentioned in http://archives.postgresql.org/pgsql-hackers/2008-06/msg00616.php now work as expected. Revision of the code involves: - A switch statement to handle the result type of xmlXPathCompiledEval. - an additional function xmlpathobjtoxmltype. diff of the revisioned code with respect to original is in attached file. kind regards, Arie Bikker 114d113 < static text *xml_xmlpathobjtoxmltype(xmlXPathObjectPtr cur); 3269,3309d3267 < < /* < * Convert XML pathobject to text for non-nodeset objects < */ < static text * < xml_xmlpathobjtoxmltype(xmlXPathObjectPtr cur) < { < xmltype *result; < < if (cur->type == XPATH_BOOLEAN) < { < PG_TRY(); < { < result = cstring_to_text((char *)(xmlXPathCastToBoolean(cur)?"t":"f")); < } < PG_CATCH(); < { < PG_RE_THROW(); < } < PG_END_TRY(); < } < else < { < xmlChar *str; < < str = xmlXPathCastToString(cur); < PG_TRY(); < { < result = (xmltype *) cstring_to_text((char *) str); < } < PG_CATCH(); < { < xmlFree(str); < PG_RE_THROW(); < } < PG_END_TRY(); < xmlFree(str); < } < < return result; < } 3463,3471c3421,3429 < switch (xpathobj->type) { < case XPATH_NODESET: { < /* return empty array in cases when nothing is found */ < if (xpathobj->nodesetval == NULL) < res_nitems = 0; < else < res_nitems = xpathobj->nodesetval->nodeNr; < < if (res_nitems) --- > /* return empty array in cases when nothing is found */ > if (xpathobj->nodesetval == NULL) > res_nitems = 0; > else > res_nitems = xpathobj->nodesetval->nodeNr; > > if (res_nitems) > { > for (i = 0; i < xpathobj->nodesetval->nodeNr; i++) 3473,3482c3431,3437 < for (i = 0; i < xpathobj->nodesetval->nodeNr; i++) < { < Datum elem; < bool elemisnull = false; < < elem = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i])); < astate = accumArrayResult(astate, elem, < elemisnull, XMLOID, < CurrentMemoryContext); < } --- > Datum elem; > bool elemisnull = false; > > elem = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i])); > astate = accumArrayResult(astate, elem, > elemisnull, XMLOID, > CurrentMemoryContext); 3484d3438 < break; 3486,3500d3439 < case XPATH_BOOLEAN: < case XPATH_NUMBER: < case XPATH_STRING: { < Datum elem; < bool elemisnull = false; < < elem = PointerGetDatum(xml_xmlpathobjtoxmltype(xpathobj)); < astate = accumArrayResult(astate, elem, < elemisnull, XMLOID, < CurrentMemoryContext); < break; < } < default: { < } < } 3524c3463 < if (astate== NULL) --- > if (res_nitems == 0)
Arie Bikker wrote: > Hi all, > > Well I had to burn some midnight oil trying to figure out why a > construct like > SELECT xpath('name()','<a/>'); > doesn't give the expected result. Kept getting an empty array: > xpath > ------------- > {} > instead of the expected "{a}" > BugID 4294 and the TODO item "better handling of XPath data types" > pointed in the right direction. > whithin src/backend/utils/adt/xml.c in the function xpath the result of > the call to xmlXPathCompiledEval is not handled optimally. In fact, the > result is assumed to be a nodeset without consulting the ->type member > of the result. I've made some minor changes to xml.c to handle some > non-nodeset results of xmlXPathCompiledEval. > Essentially, the revised code makes an array of all the nodes in the > xpathobj result in case this is a nodeset, or an array with a single > element in case the reult is a number/string/boolean. The problem cases > mentioned in > http://archives.postgresql.org/pgsql-hackers/2008-06/msg00616.php now > work as expected. > Revision of the code involves: > - A switch statement to handle the result type of xmlXPathCompiledEval. > - an additional function xmlpathobjtoxmltype. > > diff of the revisioned code with respect to original is in attached file. > > kind regards, Arie Bikker Well that's interesting. I was getting ready to dig into libxml2 to see what it would take to return scalar values from xpath expressions. Thanks. Scott
On Tue, Jan 5, 2010 at 6:09 PM, Arie Bikker <arie@abikker.nl> wrote: > Hi all, > > Well I had to burn some midnight oil trying to figure out why a construct > like > SELECT xpath('name()','<a/>'); > doesn't give the expected result. Kept getting an empty array: > xpath > ------------- > {} > instead of the expected "{a}" > BugID 4294 and the TODO item "better handling of XPath data types" pointed > in the right direction. > whithin src/backend/utils/adt/xml.c in the function xpath the result of the > call to xmlXPathCompiledEval is not handled optimally. In fact, the result > is assumed to be a nodeset without consulting the ->type member of the > result. I've made some minor changes to xml.c to handle some non-nodeset > results of xmlXPathCompiledEval. > Essentially, the revised code makes an array of all the nodes in the > xpathobj result in case this is a nodeset, or an array with a single element > in case the reult is a number/string/boolean. The problem cases mentioned in > http://archives.postgresql.org/pgsql-hackers/2008-06/msg00616.php now work > as expected. > Revision of the code involves: > - A switch statement to handle the result type of xmlXPathCompiledEval. > - an additional function xmlpathobjtoxmltype. > > diff of the revisioned code with respect to original is in attached file. > > kind regards, Arie Bikker Hi, Could you please resend this as a context diff and add it to our patch management application? http://wiki.postgresql.org/wiki/Submitting_a_Patch https://commitfest.postgresql.org/action/commitfest_view/open Thanks! ...Robert
Robert Haas wrote:
BTW. here a some nice examples:
- Get the number of attributes of the first childnode:
select ( xpath('count(@*)',(xpath('*[1]','<a b="c"><d e="f" g="j"/></a>'))[1]))[1];
- an alternative for xpath_exist('/a/d')
select (xpath('boolean(/a/d)','<a b="c"><d e="f" g="j"/></a>'))[1];
- fixes bug 4206
select xpath('//text()',xmlparse(document '<?xml version="1.0"?><elem1><elem2>one</elem2><elem2>two</elem2><elem2>three</elem2><elem3att="2"/></elem1>'));
- fixes bug 4294
select xpath('name(/my:a/*[last()])', '<a xmlns="http://myns.com/ns"><b>text1</b><c>text2</c></a>', ARRAY[ARRAY['my','http://myns.com/ns']]);
kind regards, Arie Bikker
Hope this is the right attachement type (I'm new at this)On Tue, Jan 5, 2010 at 6:09 PM, Arie Bikker <arie@abikker.nl> wrote:Hi all, Well I had to burn some midnight oil trying to figure out why a construct like SELECT xpath('name()','<a/>'); doesn't give the expected result. Kept getting an empty array: xpath ------------- {} instead of the expected "{a}" BugID 4294 and the TODO item "better handling of XPath data types" pointed in the right direction. whithin src/backend/utils/adt/xml.c in the function xpath the result of the call to xmlXPathCompiledEval is not handled optimally. In fact, the result is assumed to be a nodeset without consulting the ->type member of the result. I've made some minor changes to xml.c to handle some non-nodeset results of xmlXPathCompiledEval. Essentially, the revised code makes an array of all the nodes in the xpathobj result in case this is a nodeset, or an array with a single element in case the reult is a number/string/boolean. The problem cases mentioned in http://archives.postgresql.org/pgsql-hackers/2008-06/msg00616.php now work as expected. Revision of the code involves: - A switch statement to handle the result type of xmlXPathCompiledEval. - an additional function xmlpathobjtoxmltype. diff of the revisioned code with respect to original is in attached file. kind regards, Arie BikkerHi, Could you please resend this as a context diff and add it to our patch management application? http://wiki.postgresql.org/wiki/Submitting_a_Patch https://commitfest.postgresql.org/action/commitfest_view/open Thanks! ...Robert
BTW. here a some nice examples:
- Get the number of attributes of the first childnode:
select ( xpath('count(@*)',(xpath('*[1]','<a b="c"><d e="f" g="j"/></a>'))[1]))[1];
- an alternative for xpath_exist('/a/d')
select (xpath('boolean(/a/d)','<a b="c"><d e="f" g="j"/></a>'))[1];
- fixes bug 4206
select xpath('//text()',xmlparse(document '<?xml version="1.0"?><elem1><elem2>one</elem2><elem2>two</elem2><elem2>three</elem2><elem3att="2"/></elem1>'));
- fixes bug 4294
select xpath('name(/my:a/*[last()])', '<a xmlns="http://myns.com/ns"><b>text1</b><c>text2</c></a>', ARRAY[ARRAY['my','http://myns.com/ns']]);
kind regards, Arie Bikker
Sorry for the previous NUUUB post, didn't now the mailing list doesn't support html ;( Robert Haas wrote: > On Tue, Jan 5, 2010 at 6:09 PM, Arie Bikker <arie@abikker.nl> wrote: > >> Hi all, >> >> Well I had to burn some midnight oil trying to figure out why a construct >> like >> SELECT xpath('name()','<a/>'); >> doesn't give the expected result. Kept getting an empty array: >> xpath >> ------------- >> {} >> instead of the expected "{a}" >> BugID 4294 and the TODO item "better handling of XPath data types" pointed >> in the right direction. >> whithin src/backend/utils/adt/xml.c in the function xpath the result of the >> call to xmlXPathCompiledEval is not handled optimally. In fact, the result >> is assumed to be a nodeset without consulting the ->type member of the >> result. I've made some minor changes to xml.c to handle some non-nodeset >> results of xmlXPathCompiledEval. >> Essentially, the revised code makes an array of all the nodes in the >> xpathobj result in case this is a nodeset, or an array with a single element >> in case the reult is a number/string/boolean. The problem cases mentioned in >> http://archives.postgresql.org/pgsql-hackers/2008-06/msg00616.php now work >> as expected. >> Revision of the code involves: >> - A switch statement to handle the result type of xmlXPathCompiledEval. >> - an additional function xmlpathobjtoxmltype. >> >> diff of the revisioned code with respect to original is in attached file. >> >> kind regards, Arie Bikker >> > > Hi, > > Could you please resend this as a context diff and add it to our patch > management application? > > http://wiki.postgresql.org/wiki/Submitting_a_Patch > https://commitfest.postgresql.org/action/commitfest_view/open > > Thanks! > > ...Robert > Hope this is the right attachement type (I'm new at this) BTW. here a some nice examples: - Get the number of attributes of the first childnode: select ( xpath('count(@*)',(xpath('*[1]','<a b="c"><d e="f" g="j"/></a>'))[1]))[1]; - an alternative for xpath_exist('/a/d') select (xpath('boolean(/a/d)','<a b="c"><d e="f" g="j"/></a>'))[1]; - fixes bug 4206 select xpath('//text()',xmlparse(document '<?xml version="1.0"?><elem1><elem2>one</elem2><elem2>two</elem2><elem2>three</elem2><elem3att="2"/></elem1>')); - fixes bug 4294 select xpath('name(/my:a/*[last()])', '<a xmlns="http://myns.com/ns"><b>text1</b><c>text2</c></a>', ARRAY[ARRAY['my','http://myns.com/ns']]); kind regards, Arie Bikker *** src/backend/utils/adt/xml.c.old Fri Sep 4 12:49:43 2009 --- src/backend/utils/adt/xml.c Wed Jan 6 21:32:22 2010 *************** *** 111,116 **** --- 111,117 ---- static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, int encoding); static text *xml_xmlnodetoxmltype(xmlNodePtr cur); + static text *xml_xmlpathobjtoxmltype(xmlXPathObjectPtr cur); #endif /* USE_LIBXML */ static StringInfo query_to_xml_internal(const char *query, char *tablename, *************** *** 3265,3270 **** --- 3266,3312 ---- return result; } + + /* + * Convert XML pathobject to text for non-nodeset objects + */ + static text * + xml_xmlpathobjtoxmltype(xmlXPathObjectPtr cur) + { + xmltype *result; + + if (cur->type == XPATH_BOOLEAN) + { + PG_TRY(); + { + result = cstring_to_text((char *)(xmlXPathCastToBoolean(cur)?"t":"f")); + } + PG_CATCH(); + { + PG_RE_THROW(); + } + PG_END_TRY(); + } + else + { + xmlChar *str; + + str = xmlXPathCastToString(cur); + PG_TRY(); + { + result = (xmltype *) cstring_to_text((char *) str); + } + PG_CATCH(); + { + xmlFree(str); + PG_RE_THROW(); + } + PG_END_TRY(); + xmlFree(str); + } + + return result; + } #endif *************** *** 3418,3442 **** xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR, "could not create XPath object"); ! /* return empty array in cases when nothing is found */ ! if (xpathobj->nodesetval == NULL) ! res_nitems = 0; ! else ! res_nitems = xpathobj->nodesetval->nodeNr; ! ! if (res_nitems) ! { ! for (i = 0; i < xpathobj->nodesetval->nodeNr; i++) { ! Datum elem; ! bool elemisnull = false; ! ! elem = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i])); ! astate = accumArrayResult(astate, elem, ! elemisnull, XMLOID, ! CurrentMemoryContext); } } } PG_CATCH(); { --- 3460,3504 ---- xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR, "could not create XPath object"); ! switch (xpathobj->type) { ! case XPATH_NODESET: { ! /* return empty array in cases when nothing is found */ ! if (xpathobj->nodesetval == NULL) ! res_nitems = 0; ! else ! res_nitems = xpathobj->nodesetval->nodeNr; ! ! if (res_nitems) { ! for (i = 0; i < xpathobj->nodesetval->nodeNr; i++) ! { ! Datum elem; ! bool elemisnull = false; ! ! elem = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i])); ! astate = accumArrayResult(astate, elem, ! elemisnull, XMLOID, ! CurrentMemoryContext); ! } } + break; } + case XPATH_BOOLEAN: + case XPATH_NUMBER: + case XPATH_STRING: { + Datum elem; + bool elemisnull = false; + + elem = PointerGetDatum(xml_xmlpathobjtoxmltype(xpathobj)); + astate = accumArrayResult(astate, elem, + elemisnull, XMLOID, + CurrentMemoryContext); + break; + } + default: { + ereport(WARNING, (errmsg("Unknown PathObjectType (%d)", xpathobj->type))); + } + } } PG_CATCH(); { *************** *** 3460,3466 **** xmlFreeDoc(doc); xmlFreeParserCtxt(ctxt); ! if (res_nitems == 0) PG_RETURN_ARRAYTYPE_P(construct_empty_array(XMLOID)); else PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate, CurrentMemoryContext)); --- 3522,3528 ---- xmlFreeDoc(doc); xmlFreeParserCtxt(ctxt); ! if (astate == NULL) PG_RETURN_ARRAYTYPE_P(construct_empty_array(XMLOID)); else PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate, CurrentMemoryContext));
Arie Bikker wrote: > Sorry for the previous NUUUB post, didn't now the mailing list doesn't > support html ;( > Robert Haas wrote: >> On Tue, Jan 5, 2010 at 6:09 PM, Arie Bikker <arie@abikker.nl> wrote: >> >>> Hi all, >>> >>> Well I had to burn some midnight oil trying to figure out why a >>> construct >>> like >>> SELECT xpath('name()','<a/>'); >>> doesn't give the expected result. Kept getting an empty array: >>> xpath >>> ------------- >>> {} >>> instead of the expected "{a}" >>> BugID 4294 and the TODO item "better handling of XPath data types" >>> pointed >>> in the right direction. >>> whithin src/backend/utils/adt/xml.c in the function xpath the result >>> of the >>> call to xmlXPathCompiledEval is not handled optimally. In fact, the >>> result >>> is assumed to be a nodeset without consulting the ->type member of the >>> result. I've made some minor changes to xml.c to handle some non-nodeset >>> results of xmlXPathCompiledEval. >>> Essentially, the revised code makes an array of all the nodes in the >>> xpathobj result in case this is a nodeset, or an array with a single >>> element >>> in case the reult is a number/string/boolean. The problem cases >>> mentioned in >>> http://archives.postgresql.org/pgsql-hackers/2008-06/msg00616.php now >>> work >>> as expected. >>> Revision of the code involves: >>> - A switch statement to handle the result type of xmlXPathCompiledEval. >>> - an additional function xmlpathobjtoxmltype. >>> >>> diff of the revisioned code with respect to original is in attached >>> file. >>> >>> kind regards, Arie Bikker >>> >> >> Hi, >> >> Could you please resend this as a context diff and add it to our patch >> management application? >> >> http://wiki.postgresql.org/wiki/Submitting_a_Patch >> https://commitfest.postgresql.org/action/commitfest_view/open >> >> Thanks! >> >> ...Robert >> > Hope this is the right attachement type (I'm new at this) > BTW. here a some nice examples: > > - Get the number of attributes of the first childnode: > > select ( xpath('count(@*)',(xpath('*[1]','<a b="c"><d e="f" > g="j"/></a>'))[1]))[1]; > > - an alternative for xpath_exist('/a/d') > select (xpath('boolean(/a/d)','<a b="c"><d e="f" g="j"/></a>'))[1]; > > - fixes bug 4206 > > select xpath('//text()',xmlparse(document '<?xml > version="1.0"?><elem1><elem2>one</elem2><elem2>two</elem2><elem2>three</elem2><elem3att="2"/></elem1>')); > > > - fixes bug 4294 > > select xpath('name(/my:a/*[last()])', '<a > xmlns="http://myns.com/ns"><b>text1</b><c>text2</c></a>', > ARRAY[ARRAY['my','http://myns.com/ns']]); Awesome! This really helps. Scott
On ons, 2010-01-06 at 23:46 +0100, Arie Bikker wrote: > Hope this is the right attachement type (I'm new at this) > BTW. here a some nice examples: > > - Get the number of attributes of the first childnode: > > select ( xpath('count(@*)',(xpath('*[1]','<a b="c"><d e="f" > g="j"/></a>'))[1]))[1]; > > - an alternative for xpath_exist('/a/d') > select (xpath('boolean(/a/d)','<a b="c"><d e="f" g="j"/></a>'))[1]; > > - fixes bug 4206 > > select xpath('//text()',xmlparse(document '<?xml > version="1.0"?><elem1><elem2>one</elem2><elem2>two</elem2><elem2>three</elem2><elem3att="2"/></elem1>')); > > - fixes bug 4294 > > select xpath('name(/my:a/*[last()])', '<a > xmlns="http://myns.com/ns"><b>text1</b><c>text2</c></a>', > ARRAY[ARRAY['my','http://myns.com/ns']]); Instead of converting everything to text, there have been previous suggestions to add functionx like xpath_string, xpath_number, xpath_boolean that return the appropriate types from xpath. This could provide for better type safety and probably also more clarity. In any case, please consider adding test cases like the above to the regression tests in whatever patch comes out at the end.
Peter Eisentraut wrote: > On ons, 2010-01-06 at 23:46 +0100, Arie Bikker wrote: > >> Hope this is the right attachement type (I'm new at this) >> BTW. here a some nice examples: >> >> - Get the number of attributes of the first childnode: >> >> select ( xpath('count(@*)',(xpath('*[1]','<a b="c"><d e="f" >> g="j"/></a>'))[1]))[1]; >> >> - an alternative for xpath_exist('/a/d') >> select (xpath('boolean(/a/d)','<a b="c"><d e="f" g="j"/></a>'))[1]; >> >> - fixes bug 4206 >> >> select xpath('//text()',xmlparse(document '<?xml >> version="1.0"?><elem1><elem2>one</elem2><elem2>two</elem2><elem2>three</elem2><elem3att="2"/></elem1>')); >> >> - fixes bug 4294 >> >> select xpath('name(/my:a/*[last()])', '<a >> xmlns="http://myns.com/ns"><b>text1</b><c>text2</c></a>', >> ARRAY[ARRAY['my','http://myns.com/ns']]); >> > > Instead of converting everything to text, there have been previous > suggestions to add functionx like xpath_string, xpath_number, > xpath_boolean that return the appropriate types from xpath. This could > provide for better type safety and probably also more clarity. > > In any case, please consider adding test cases like the above to the > regression tests in whatever patch comes out at the end. > > As an addition these xpath_sometype functions have been mentioned and can be handy. But, considering that the xpath function itself is a generalized function, the user of this function might not have beforehand knowledge of the type of the result; the first argument of the call could be used in a dynamic fashion. Comming back to the xpath_sometype functions - would these definitions be suitable? boolean xpath_boolean(xpath, xml [, nsarray]) text xpath_string(xpath, xml [, nsarray]) int xpath_number(xpath, xml [,nsarray]) implementation can be done via an xpath_nonnode function defined as: text xpath_nonnode(xpath, xml [,nsarray]) where each of the xpath_sometype functions simply interpret the text as its target type. Is this the way to go? kind regards, Arie Bikker
Arie Bikker wrote: > Peter Eisentraut wrote: >> On ons, 2010-01-06 at 23:46 +0100, Arie Bikker wrote: >> >>> Hope this is the right attachement type (I'm new at this) >>> BTW. here a some nice examples: >>> >>> - Get the number of attributes of the first childnode: >>> >>> select ( xpath('count(@*)',(xpath('*[1]','<a b="c"><d e="f" >>> g="j"/></a>'))[1]))[1]; >>> >>> - an alternative for xpath_exist('/a/d') >>> select (xpath('boolean(/a/d)','<a b="c"><d e="f" g="j"/></a>'))[1]; >>> >>> - fixes bug 4206 >>> >>> select xpath('//text()',xmlparse(document '<?xml >>> version="1.0"?><elem1><elem2>one</elem2><elem2>two</elem2><elem2>three</elem2><elem3att="2"/></elem1>')); >>> >>> >>> - fixes bug 4294 >>> >>> select xpath('name(/my:a/*[last()])', '<a >>> xmlns="http://myns.com/ns"><b>text1</b><c>text2</c></a>', >>> ARRAY[ARRAY['my','http://myns.com/ns']]); >> >> Instead of converting everything to text, there have been previous >> suggestions to add functionx like xpath_string, xpath_number, >> xpath_boolean that return the appropriate types from xpath. This could >> provide for better type safety and probably also more clarity. >> >> In any case, please consider adding test cases like the above to the >> regression tests in whatever patch comes out at the end. >> >> > As an addition these xpath_sometype functions have been mentioned and > can be handy. But, considering that the xpath function itself is a > generalized function, the user of this function might not have > beforehand knowledge of the type of the result; the first argument of > the call could be used in a dynamic fashion. > Comming back to the xpath_sometype functions - would these definitions > be suitable? > > boolean xpath_boolean(xpath, xml [, nsarray]) > text xpath_string(xpath, xml [, nsarray]) > int xpath_number(xpath, xml [,nsarray]) > > implementation can be done via an xpath_nonnode function defined as: > text xpath_nonnode(xpath, xml [,nsarray]) > where each of the xpath_sometype functions simply interpret the text as > its target type. > Is this the way to go? > > kind regards, Arie Bikker Postgres' type system is MUCH more robust than anything in XPath/XML. And folks who use XML on a regular basis expect most XPath expressions to return a string any way. For instance how many confused users do you think you'll get with something like: SELECT xpath_boolean('boolean(/root/@bar)', '<root bar="false"/>) -- evaluates to true or SELECT xpath_number('/root/@foo', '<root foo="42"/>') --xpath will return the string '42' not a number unless you do something like: SELECT xpath_number('number(/root/@foo)', '<root foo="42"/>') I think we'd be much better of having a function like xpath_nonnode() or xpath_value() that returns text and let the user handle the casting. Scott Bailey
Hi, here's a review of the patch: It applies with offsets, but worked fine for me. It works as advertised, and I believe it is a solid step forward from the current situation. As far as the coding goes, the PG_TRY/CATCH in xml_xmlpathobjtoxmltype seems unnecessary in the XPATH_BOOLEAN branch, as the CATCH branch only rethrows the elog. Additional whitespace in cstring_to_text((char *)(xmlXPathCastToBoolean(cur)?"t":"f")); would help also. Idle though: why go through xmlXPathCastToBoolean at all? Couldn't you just user xmlXPathCastToString always? The documentation suggests that you will get "true" or "false", which is as good as "t" or "f" and simlifies the code of xml_xmlpathobjtoxmltype quite a bit. The default: branch of the switch in the xpath() function should not be a elog(WARNING) but an elog(ERROR). Surely getting a bogus values from XPath is an error that should prevent us from returning a possibly wrong answer. The switch statement should have curly braces on separate lines and the case statements should be indented and written without curly braces. The patch really needs a couple of regression tests that demonstrate what was happening previously and what will happen now, i.e. string/number/boolean results from XPath expressions are not ignored. See http://wiki.postgresql.org/wiki/Regression_test_authoring for info on how to write a unit test for PostgreSQL. The change in xpah() behaviour should be reflected in the documentation, so the patch really should include also doc changes. Specifically, I think it should be explicitly documented that xpath() always returns an array of XML nodes, even in cases where according to the XPath spec it should return a boolean or a number. It's clearly a step forward, though, because previously we were returning an empty resultset instead of something that can at least be cast to a PG type with good hope of resulting in what the programmer expected. Scott Bailey wrote: > Arie Bikker wrote: >> Peter Eisentraut wrote: >>> On ons, 2010-01-06 at 23:46 +0100, Arie Bikker wrote: >>> >>>> Hope this is the right attachement type (I'm new at this) >>>> BTW. here a some nice examples: >>>> >>>> - Get the number of attributes of the first childnode: >>>> >>>> select ( xpath('count(@*)',(xpath('*[1]','<a b="c"><d e="f" >>>> g="j"/></a>'))[1]))[1]; >>>> >>>> - an alternative for xpath_exist('/a/d') >>>> select (xpath('boolean(/a/d)','<a b="c"><d e="f" g="j"/></a>'))[1]; >>>> >>>> - fixes bug 4206 I could not reproduce that in HEAD, probably got fixed when that terrible "add <x> </x>" hack has been taken out. >>>> - fixes bug 4294 Yes, it does fix that one. Additional moral from these two is: they should be part of the regression test suite. >>> Instead of converting everything to text, there have been previous >>> suggestions to add functionx like xpath_string, xpath_number, >>> xpath_boolean that return the appropriate types from xpath. This could >>> provide for better type safety and probably also more clarity. Possibly, but this patch is useful even without those. >>> In any case, please consider adding test cases like the above to the >>> regression tests in whatever patch comes out at the end. +1 > Postgres' type system is MUCH more robust than anything in XPath/XML. > And folks who use XML on a regular basis expect most XPath expressions > to return a string any way. > For instance how many confused users do you think you'll get with > something like: > SELECT xpath_boolean('boolean(/root/@bar)', '<root bar="false"/>) > -- evaluates to true The users' confusion would come from the fact that XPath has different rules for string->boolean casts than PG. In XPath (as in Python, Perl I guess and some other languages) a string when coerced to boolean is false only if it is empty. PosgtreSQL considers 'no'::boolean, 'fal'::boolean and 'n'::boolean as false and the empty string as something uncoercable to boolean. Both PostgreSQL casting rules and XPath casting rules cannot be changed, so they will always be confusing in mixed contexts. > I think we'd be much better of having a function like xpath_nonnode() or > xpath_value() that returns text and let the user handle the casting. I'm not sure about this, TBH. At first I thought it's a good idea, but after some thinking I beliefe xpath_value() would be a simple wrapper around (xpath())[1] that could also do some additional validation, like checking if the result from XPath is not XPATH_NODESET. The merit of xpath_boolean and xpath_number would be in using the XPath casting conventions, not the PG ones, and in translating the XPath return type to PG's type. In short, you can always do (xpath())[1] and get a string, which you can then instruct PG to cast using PG casting rules. My proposal is to accept Arie's patch (barring objections already raised) and consider adding xpath_{boolean,number,string} that would return the respective PG datatype and would internally check if the result from XPath is XPATH_{BOOLEAN,NUMBER,STRING} respectively and would do the cast for you or throw an error. This way if you'd do xpath_number('/root/@num', '<root num="40"/>') you would get an error, because the result from XPath would be XPATH_STRING, forcing you do explicitly use the XPath function number() (and which is a very good thing, because of the different casting semantics I already mentioned). I'm only afraid about the subtleties around what XPath considers a number and what PG considers a number (see http://www.w3.org/TR/xpath#numbers), string encoding etc. but that's probably just a matter of thinking it out. I'll mark this patch as Waiting on Author to give Arie the chance to consider my remarks and add regression tests, after which this patch looks committable. If no one steps up I could then come up with a follow-up patch that adds xpath_number and friends. Cheers, Jan
On Sun, Jan 17, 2010 at 11:33 AM, Jan Urbański <wulczer@wulczer.org> wrote: > [ detailed review ] Arie, Are you planning to submit an updated patch? If so, please do so soon. Thanks, ...Robert
Robert Haas wrote: > On Sun, Jan 17, 2010 at 11:33 AM, Jan Urbański <wulczer@wulczer.org> wrote: >> [ detailed review ] > > Arie, > > Are you planning to submit an updated patch? If so, please do so soon. > > Thanks, > > ...Robert What is the time limit on this? I've been testing Arie's patch and I want to see it get in. I can make the changes Jan requested if Arie doesn't. How long should I give him? Scott
On Thu, Jan 28, 2010 at 11:03 PM, Scott Bailey <artacus@comcast.net> wrote: > Robert Haas wrote: >> On Sun, Jan 17, 2010 at 11:33 AM, Jan Urbański <wulczer@wulczer.org> >> wrote: >>> [ detailed review ] >> >> Arie, >> >> Are you planning to submit an updated patch? If so, please do so soon. >> > What is the time limit on this? I've been testing Arie's patch and I want to > see it get in. I can make the changes Jan requested if Arie doesn't. How > long should I give him? If you can update the patch, I'd go ahead... we don't have a super-formal deadline, but we can certainly wait a few more days if someone's working on it. ...Robert
On 2010-Jan-28, Scott Bailey wrote: > Robert Haas wrote: > > On Sun, Jan 17, 2010 at 11:33 AM, Jan Urbański <wulczer@wulczer.org> wrote: > > > [ detailed review ] > > > > Arie, > > > > Are you planning to submit an updated patch? If so, please do so soon. > What is the time limit on this? I've been testing Arie's patch and I want to > see it get in. I can make the changes Jan requested if Arie doesn't. How > long should I give him? I'm pretty sure that "nine years" is not the right answer to that question. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 07/18/19 23:33, Alvaro Herrera wrote: > On 2010-Jan-28, Scott Bailey wrote: >> What is the time limit on this? I've been testing Arie's patch and I want to >> see it get in. I can make the changes Jan requested if Arie doesn't. How >> long should I give him? > > I'm pretty sure that "nine years" is not the right answer to that A good first question might be, of any issues Arie identified, which, if any, are still issues in PG 12 beta1? Regards, -Chap