Thread: xpath processing brain dead
Andrew Gierth was just pointing out to me how badly broken our XPath processing is. For fear of passing an ill formed fragment of xml to the processor, we strip the xml declaration if any and surround what's left with '<x>" and '</x>' and prepend '/x' to the supposed xpath. This is just horrible. It will break for every xpath expression that doesn't begin with a '/' and probably for many that do. This whole thing is a mess, and I suspect the only fix for now is to undo all the mangling of both the xml and the xpath expression. If the programmer passes an ill formed piece of xml to the processor that is their lookout, but I think we should ensure that we give back correct results on well formed input. The only good piece of news is that the xpath procedures in contrib/xml2 don't apparently suffer these faults. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > For fear of passing an ill formed fragment of xml to the processor, we > strip the xml declaration if any and surround what's left with '<x>" and > '</x>' and prepend '/x' to the supposed xpath. This is just horrible. I seem to recall having complained about that at the time, but I didn't (and don't) know enough about xpath to do any better. > This whole thing is a mess, and I suspect the only fix for now is to > undo all the mangling of both the xml and the xpath expression. I don't think we should change the behavior if it's just to arrive at another less-than-desirable behavior. Whacking semantics around afresh with each release does not endear us to users. If we know how to fix it right, great; but if we can't then we should keep compatibility with 8.3 until we can. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> For fear of passing an ill formed fragment of xml to the processor, we >> strip the xml declaration if any and surround what's left with '<x>" and >> '</x>' and prepend '/x' to the supposed xpath. This is just horrible. >> > > I seem to recall having complained about that at the time, but I didn't > (and don't) know enough about xpath to do any better. > Well, a few of us do. I guess I took my eye off the ball a bit back when we were putting this into 8.3. > >> This whole thing is a mess, and I suspect the only fix for now is to >> undo all the mangling of both the xml and the xpath expression. >> > > I don't think we should change the behavior if it's just to arrive at > another less-than-desirable behavior. Whacking semantics around afresh > with each release does not endear us to users. If we know how to fix it > right, great; but if we can't then we should keep compatibility with 8.3 > until we can. > > > Honestly, this is a bug, pure and simple. There really can't be an argument about that. For the stable branch, we could make the following changes that should result in a Pareto improvement (nothing gets worse while some things get better): * only do the xml transformation if the xml is known not to be be well formed * if we need to mangle the xpath expressiondue to doing the xml transformation, then unless the xpath expression begins with a '/', prepend it with'/x//'. (two slashes corresponds to the descendent axis in xpath - in effect it stands for any number of descendentelements). But that's just a holding operation. For 8.4 we should stop this nonsense and simply say that it is up to the programmer to ensure that the xml passed to the processor is well formed. The thing that is so very bad about this is that if the programmer *has* made sure that the inputs are correct, s/he can still end up with broken results. If we're going to try to fix bad inputs, we must make damn sure that we don't break on correct inputs as a result. However, I can't off hand think of a lightning way to fix bad inputs that doesn't carry some danger to good inputs. Until someone comes up with something tolerably bulletproof, I suggest that we simply say that it is the programmer's responsibility. cheers andrew
Andrew Dunstan wrote: > For fear of passing an ill formed fragment of xml to the processor, we > strip the xml declaration if any and surround what's left with '<x>" and > '</x>' and prepend '/x' to the supposed xpath. This is just horrible. It > will break for every xpath expression that doesn't begin with a '/' and > probably for many that do. > > This whole thing is a mess, and I suspect the only fix for now is to > undo all the mangling of both the xml and the xpath expression. If the > programmer passes an ill formed piece of xml to the processor that is > their lookout, but I think we should ensure that we give back correct > results on well formed input. It's not about ill-formed pieces, it is about (well-formed) content fragments that are not full documents (exactly one root element). libxml2 doesn't support xpath on content fragments. The tradeoff here is either supporting no xpath at all on content fragments or supporting some xpath on both kinds of xml data. Whoever made the initial implementation of this (Nikolay, Cc) decided for the latter, but I can't say I'm happy with it either. I'd be OK with changing it, but I haven't had time to get to it, unfortunately. See also <http://wiki.postgresql.org/wiki/XPath_Todo#XPath>.
Peter Eisentraut wrote: > Andrew Dunstan wrote: >> For fear of passing an ill formed fragment of xml to the processor, >> we strip the xml declaration if any and surround what's left with >> '<x>" and '</x>' and prepend '/x' to the supposed xpath. This is just >> horrible. It will break for every xpath expression that doesn't begin >> with a '/' and probably for many that do. >> >> This whole thing is a mess, and I suspect the only fix for now is to >> undo all the mangling of both the xml and the xpath expression. If >> the programmer passes an ill formed piece of xml to the processor >> that is their lookout, but I think we should ensure that we give back >> correct results on well formed input. > > It's not about ill-formed pieces, it is about (well-formed) content > fragments that are not full documents (exactly one root element). > libxml2 doesn't support xpath on content fragments. > > The tradeoff here is either supporting no xpath at all on content > fragments or supporting some xpath on both kinds of xml data. Whoever > made the initial implementation of this (Nikolay, Cc) decided for the > latter, but I can't say I'm happy with it either. I'd be OK with > changing it, but I haven't had time to get to it, unfortunately. > > See also <http://wiki.postgresql.org/wiki/XPath_Todo#XPath>. I don't think it is our responsibility to remedy the deficiencies of libxml2, especially if it involves breaking the processing of valid xpath expressions on well formed XML. If someone can come up with a correct scheme for handling fragments, I'm all ears, but otherwise I think a) we should rip this out of 8.4 and b) we should try to make 8.3 slightly less broken at least, along the lines of my earlier suggestion. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I don't think it is our responsibility to remedy the deficiencies of > libxml2, especially if it involves breaking the processing of valid > xpath expressions on well formed XML. > If someone can come up with a correct scheme for handling fragments, I'm > all ears, but otherwise I think a) we should rip this out of 8.4 and b) > we should try to make 8.3 slightly less broken at least, along the lines > of my earlier suggestion. I'm okay with removing this for 8.4, but I think it's too late to change the behavior of 8.3. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> I don't think it is our responsibility to remedy the deficiencies of >> libxml2, especially if it involves breaking the processing of valid >> xpath expressions on well formed XML. >> > > >> If someone can come up with a correct scheme for handling fragments, I'm >> all ears, but otherwise I think a) we should rip this out of 8.4 and b) >> we should try to make 8.3 slightly less broken at least, along the lines >> of my earlier suggestion. >> > > I'm okay with removing this for 8.4, but I think it's too late to > change the behavior of 8.3. > > > It's never too late to fix a bug. The current behaviour is just plain nonsense ... if your xpath expression is 'foo/bar' it ends up searching for '/xfoo/bar' which can't possibly be right. Surely we can at least fix that. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> I'm okay with removing this for 8.4, but I think it's too late to >> change the behavior of 8.3. > It's never too late to fix a bug. When the proposed fix involves breaking cases that used to behave usefully, you need a much stronger argument than that. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Tom Lane wrote: >> >>> I'm okay with removing this for 8.4, but I think it's too late to >>> change the behavior of 8.3. >>> > > >> It's never too late to fix a bug. >> > > When the proposed fix involves breaking cases that used to behave > usefully, you need a much stronger argument than that. > What I have proposed for 8.3 should not break a single case that currently behaves usefully. If anyone has a counter-example please show it. What I have proposed for 8.4 possibly would break current "useful" behaviour (FSVO "useful"), but should be done anyway on correctness grounds. cheers andrew
> What I have proposed for 8.3 should not break a single case that currently > behaves usefully. If anyone has a counter-example please show it. > > What I have proposed for 8.4 possibly would break current "useful" behaviour > (FSVO "useful"), but should be done anyway on correctness grounds. I dunno, aren't XML document fragments sort of a pretty common case? ...Robert
Robert Haas wrote: >> What I have proposed for 8.3 should not break a single case that currently >> behaves usefully. If anyone has a counter-example please show it. >> >> What I have proposed for 8.4 possibly would break current "useful" behaviour >> (FSVO "useful"), but should be done anyway on correctness grounds. >> > > I dunno, aren't XML document fragments sort of a pretty common case? > > > They are. So what? How does that contradict either of the statements made above? Programmers using libxml2 are used to handling this problem. Why must postgres alone use a totally brain dead and utterly incorrect wrapping to solve a problem that everyone else leaves up to the programmer to handle? People in this thread are not concentrating on the fact that what we do now can break correct input. That makes it an unquestionable bug, IMNSHO. There seems to be agreement about what to do for 8.4, so we seem to be arguing about what to do for 8.3. Are you *really* arguing that prepending the xpath expression with '/x' in all cases should be allowed to continue? If you are I can only assume your use of xml/xpath is probably fairly light. cheers andrew
On Thu, Feb 26, 2009 at 3:18 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> I dunno, aren't XML document fragments sort of a pretty common case? > > They are. So what? How does that contradict either of the statements made > above? > > Programmers using libxml2 are used to handling this problem. Why must > postgres alone use a totally brain dead and utterly incorrect wrapping to > solve a problem that everyone else leaves up to the programmer to handle? I can't think of any reason, especially not when you put it that way. :-) > People in this thread are not concentrating on the fact that what we do now > can break correct input. That makes it an unquestionable bug, IMNSHO. There > seems to be agreement about what to do for 8.4, so we seem to be arguing > about what to do for 8.3. Are you *really* arguing that prepending the xpath > expression with '/x' in all cases should be allowed to continue? If you are > I can only assume your use of xml/xpath is probably fairly light. Mine is very light indeed. But the change you're proposing seems like it could CONCEIVABLY break a working application that counts on PostgreSQL's particular flavor of misbehavior, and therefore it seems questionable to put that into a stable branch. The fact that the application perhaps should not have been written that way to begin with is neither here nor there. We don't want people to be afraid of upgrading to the latest point release when we fix, say, a backend crash or a data corruption bug. ...Robert
On Thu, 2009-02-26 at 15:37 -0500, Robert Haas wrote: > Mine is very light indeed. But the change you're proposing seems like > it could CONCEIVABLY break a working application that counts on > PostgreSQL's particular flavor of misbehavior, That is what release notes are for. Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
Robert Haas wrote: > But the change you're proposing seems like > it could CONCEIVABLY break a working application that counts on > PostgreSQL's particular flavor of misbehavior, and therefore it seems > questionable to put that into a stable branch. The fact that the > application perhaps should not have been written that way to begin > with is neither here nor there. We don't want people to be afraid of > upgrading to the latest point release when we fix, say, a backend > crash or a data corruption bug. > > > Let me reiterate the changes that I propose, and again challenge you to provide a working counter-example if you believe it will break anything not currently broken, even cases involving fragments. First, I propose that we abandon this mangling, if, and only if, the xml is in fact a well formed XML document. Since the whole point of the mangling is to handle situations where the XML is not a well formed document, that seems fairly straight-forward. If this change were to upset any user, it must be because they are relying on undisputably incorrect results. Second, I propose that, in the remaining cases, where we do mangle the XML, if the xpath expression does not begin with a '/', instead of prepending it with '/x/, which can not possibly be correct under any circumstance, we prepend it with '/x//' which has some possibility of giving correct results. IOW, these proposals will expand the number of correct results from the code, without contributing any new incorrect cases. These changes are *very* conservative, as is usual when we fix things on stable branches. cheers andrew
I wrote: > > Second, I propose that, in the remaining cases, where we do mangle the > XML, if the xpath expression does not begin with a '/', instead of > prepending it with '/x/, which can not possibly be correct under any > circumstance, we prepend it with '/x//' which has some possibility of > giving correct results. > ^^^ '/x' cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > First, I propose that we abandon this mangling, if, and only if, the xml > is in fact a well formed XML document. Since the whole point of the > mangling is to handle situations where the XML is not a well formed > document, that seems fairly straight-forward. If this change were to > upset any user, it must be because they are relying on undisputably > incorrect results. > Second, I propose that, in the remaining cases, where we do mangle the > XML, if the xpath expression does not begin with a '/', instead of > prepending it with '/x/, which can not possibly be correct under any > circumstance, we prepend it with '/x//' which has some possibility of > giving correct results. Hmm, does this proposal require adding a test of well-formed-ness to a code path that doesn't currently have one? If so, is that likely to contribute any noticeable slowdown? I can't offhand see an objection to this other than possible performance impact. regards, tom lane
On Thu, Feb 26, 2009 at 3:54 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > Robert Haas wrote: >> But the change you're proposing seems like >> it could CONCEIVABLY break a working application that counts on >> PostgreSQL's particular flavor of misbehavior, and therefore it seems >> questionable to put that into a stable branch. The fact that the >> application perhaps should not have been written that way to begin >> with is neither here nor there. We don't want people to be afraid of >> upgrading to the latest point release when we fix, say, a backend >> crash or a data corruption bug. > > Let me reiterate the changes that I propose, and again challenge you to > provide a working counter-example if you believe it will break anything not > currently broken, even cases involving fragments. > > First, I propose that we abandon this mangling, if, and only if, the xml is > in fact a well formed XML document. Since the whole point of the mangling is > to handle situations where the XML is not a well formed document, that seems > fairly straight-forward. If this change were to upset any user, it must be > because they are relying on undisputably incorrect results. > > Second, I propose that, in the remaining cases, where we do mangle the XML, > if the xpath expression does not begin with a '/', instead of prepending it > with '/x/, which can not possibly be correct under any circumstance, we > prepend it with '/x//' which has some possibility of giving correct results. > > IOW, these proposals will expand the number of correct results from the > code, without contributing any new incorrect cases. These changes are *very* > conservative, as is usual when we fix things on stable branches. > > cheers > > andrew You are right. Nuff said, ...Robert
On Feb 26, 2009, at 9:37 AM, Peter Eisentraut wrote: > It's not about ill-formed pieces, it is about (well-formed) content > fragments that are not full documents (exactly one root element). > libxml2 doesn't support xpath on content fragments. exslt:node-set() to the rescue? Or is that/equivalent functionality not easily accessed at the C-level with libxml2? http://www.exslt.org/exsl/functions/node-set/index.html
James Pye wrote: > On Feb 26, 2009, at 9:37 AM, Peter Eisentraut wrote: >> It's not about ill-formed pieces, it is about (well-formed) content >> fragments that are not full documents (exactly one root element). >> libxml2 doesn't support xpath on content fragments. > > exslt:node-set() to the rescue? Or is that/equivalent functionality > not easily accessed at the C-level with libxml2? > > http://www.exslt.org/exsl/functions/node-set/index.html A node-set isn't a document. In any case, this functionality doesn't appear to be in libxml2, it's in libxslt according to the reference you provided. cheers andrew
On Feb 26, 2009, at 5:41 PM, Andrew Dunstan wrote: >> http://www.exslt.org/exsl/functions/node-set/index.html > > A node-set isn't a document. yes.. :) I guess what I'm saying is that if: tinman=# SELECT XML'<foo/><bar/>'; xml -------------- <foo/><bar/> (1 row) is considered to be valid per *SQL-XML*, then it should probably be treated as a node-set in the context of xpath, not mangled with <x>...</x>.. Certainly, I would expect an implicit "node-set() call" long before wrapping the fragment in <x>...</x> and prefixing my xpath query. However, I find the validity of the above, XML'<foo/><bar/>', a bit disturbing to begin with. :P > In any case, this functionality doesn't appear to be in libxml2, > it's in libxslt according to the reference you provided. I think that's *just* referencing the list of xslt implementations that the extension function is known to be available in.. I doubt that means to imply that the function or equivalent functionality is not available in libxml2 itself. I'd wager that equivalent functionality could be implemented if it weren't directly/already available.. =)
James Pye wrote: > On Feb 26, 2009, at 5:41 PM, Andrew Dunstan wrote: >>> http://www.exslt.org/exsl/functions/node-set/index.html >> >> A node-set isn't a document. > > yes.. :) > > I guess what I'm saying is that if: > > tinman=# SELECT XML'<foo/><bar/>'; > xml > -------------- > <foo/><bar/> > (1 row) > > is considered to be valid per *SQL-XML*, then it should probably be > treated as a node-set in the context of xpath, not mangled with > <x>...</x>.. > > Certainly, I would expect an implicit "node-set() call" long before > wrapping the fragment in <x>...</x> and prefixing my xpath query. > > However, I find the validity of the above, XML'<foo/><bar/>', a bit > disturbing to begin with. :P > >> In any case, this functionality doesn't appear to be in libxml2, >> it's in libxslt according to the reference you provided. > > I think that's *just* referencing the list of xslt implementations > that the extension function is known to be available in.. I doubt that > means to imply that the function or equivalent functionality is not > available in libxml2 itself. I'd wager that equivalent functionality > could be implemented if it weren't directly/already available.. =) > James, can you point me at any call in libxml2 which will evaluate an xpath expression in the context of a nodeset instead of a document? Quite apart from anything else, xpath requires there to be a (single) context node (see http://www.w3.org/TR/xpath20/#context ). For a doc, we set that node to the document node, but what would it be for a node-set or a fragment? If we can't get over that hurdle we're screwed in pursuing your line of thought. cheers andrew
Tom Lane wrote: > Hmm, does this proposal require adding a test of well-formed-ness to > a code path that doesn't currently have one? If so, is that likely > to contribute any noticeable slowdown? > > I can't offhand see an objection to this other than possible performance > impact. > > > Yeah, testing the well-formedness might cost a bit. We could short-circuit the test by applying some comparatively fast heuristic tests. Or we could decide that we'll just fix the xpath prefix part for 8.3 and keep the wrapping. I don't want to spend a huge effort on fixing something I regard as fundamentally broken. I'll do some tests to see what the cost of extra xml parsing might be. cheers andrew We
On Thu, 2009-02-26 at 13:51 -0500, Robert Haas wrote: > > What I have proposed for 8.3 should not break a single case that currently > > behaves usefully. If anyone has a counter-example please show it. > > > > What I have proposed for 8.4 possibly would break current "useful" behaviour > > (FSVO "useful"), but should be done anyway on correctness grounds. > > I dunno, aren't XML document fragments sort of a pretty common case? I'd rather argue that xml datatype should not even accept anything but complete xml documents. Same as int field does not accept int[]. Or maybe we rather need separate xmldocument and xmlforest/xmlfragments types in next releases and leave the "base" xml as it is but deprecated due to inability to fix it without breaking backwards compatibility. -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
Hannu Krosing wrote: > On Thu, 2009-02-26 at 13:51 -0500, Robert Haas wrote: > >>> What I have proposed for 8.3 should not break a single case that currently >>> behaves usefully. If anyone has a counter-example please show it. >>> >>> What I have proposed for 8.4 possibly would break current "useful" behaviour >>> (FSVO "useful"), but should be done anyway on correctness grounds. >>> >> I dunno, aren't XML document fragments sort of a pretty common case? >> > > I'd rather argue that xml datatype should not even accept anything but > complete xml documents. Same as int field does not accept int[]. > > Or maybe we rather need separate xmldocument and xmlforest/xmlfragments > types in next releases and leave the "base" xml as it is but deprecated > due to inability to fix it without breaking backwards compatibility. > > Some of the functions, including some specified in the standard, produce fragments. That's why we have the 'IS DOCUMENT' test. You can also force validation as a document by saying SET XML OPTION DOCUMENT; cheers andrew
Andrew Dunstan wrote: > > > Tom Lane wrote: >> Hmm, does this proposal require adding a test of well-formed-ness to >> a code path that doesn't currently have one? If so, is that likely >> to contribute any noticeable slowdown? >> >> I can't offhand see an objection to this other than possible performance >> impact. >> >> >> > > Yeah, testing the well-formedness might cost a bit. We could > short-circuit the test by applying some comparatively fast heuristic > tests. > > Or we could decide that we'll just fix the xpath prefix part for 8.3 > and keep the wrapping. I don't want to spend a huge effort on fixing > something I regard as fundamentally broken. > > I'll do some tests to see what the cost of extra xml parsing might be. The extra cost appears to be fairly negligible. regression=# create table xpathtest3 as select xmlconcat(xmlelement(name unique1, unique1), '\n\t',xmlelement(name unique2, unique2), '\n\t',xmlelement(name two, two), '\n\t',xmlelement(name four, four),'\n\t',xmlelement(name ten,ten),'\n\t',xmlelement(name twenty,twenty),'\n\t',xmlelement(name hundred,hundred),'\n\t',xmlelement(name thousand,thousand),'\n\t',xmlelement(name twothusand,twothousand),'\n\t',xmlelement(name fivethous,fivethous),'\n\t',xmlelement(name tenthous,tenthous),'\n\t',xmlelement(name odd,odd),'\n\t',xmlelement(name even,even),'\n\t',xmlelement(name stringu1,stringu1),'\n\t',xmlelement(name stringu2,stringu2),'\n\t',xmlelement(name string4,string4),'\n') from tenk1; regression=# select count(*) from (select xpath('//two[text()="0"]/text()',xmlconcat) as elems from xpathtest3, generate_series(1,10) ) x ; count -------- 100000 (1 row) Time: 27.722 ms Proposed patch for 8.3 attached. (Note: it only reparses in the non-document case) cheers andrew Index: src/backend/utils/adt/xml.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/xml.c,v retrieving revision 1.68.2.6 diff -c -r1.68.2.6 xml.c *** src/backend/utils/adt/xml.c 10 Nov 2008 18:02:27 -0000 1.68.2.6 --- src/backend/utils/adt/xml.c 27 Feb 2009 20:59:28 -0000 *************** *** 3320,3360 **** xml_init(); ! /* ! * To handle both documents and fragments, regardless of the fact whether ! * the XML datum has a single root (XML well-formedness), we wrap the XML ! * datum in a dummy element (<x>...</x>) and extend the XPath expression ! * accordingly. To do it, throw away the XML prolog, if any. ! */ ! if (len >= 5 && ! xmlStrncmp((xmlChar *) datastr, (xmlChar *) "<?xml", 5) == 0) ! { ! i = 5; ! while (i < len && ! !(datastr[i - 1] == '?' && datastr[i] == '>')) ! i++; ! ! if (i == len) ! xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR, ! "could not parse XML data"); ! ++i; ! datastr += i; ! len -= i; ! } ! ! string = (xmlChar *) palloc((len + 8) * sizeof(xmlChar)); ! memcpy(string, "<x>", 3); ! memcpy(string + 3, datastr, len); ! memcpy(string + 3 + len, "</x>", 5); ! len += 7; ! ! xpath_expr = (xmlChar *) palloc((xpath_len + 3) * sizeof(xmlChar)); ! memcpy(xpath_expr, "/x", 2); ! memcpy(xpath_expr + 2, VARDATA(xpath_expr_text), xpath_len); ! xpath_expr[xpath_len + 2] = '\0'; ! xpath_len += 2; xmlInitParser(); --- 3320,3332 ---- xml_init(); ! string = (xmlChar *) palloc((len + 8) * sizeof(xmlChar)); ! xpath_expr = (xmlChar *) palloc((xpath_len + 5) * sizeof(xmlChar)); ! memcpy (string, datastr, len); ! string[len] = '\0'; ! xmlInitParser(); *************** *** 3367,3375 **** xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY, "could not allocate parser context"); doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0); ! if (doc == NULL) ! xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT, ! "could not parse XML data"); xpathctx = xmlXPathNewContext(doc); if (xpathctx == NULL) xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY, --- 3339,3410 ---- xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY, "could not allocate parser context"); doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0); ! ! if (doc == NULL || xmlDocGetRootElement(doc) == NULL) ! { ! ! /* ! * In case we have a fragment rather than a well-formed XML document, ! * which has a single root (XML well-formedness), we try again after ! * transforming the xml by stripping away the XML prolog, if any, and ! * wrapping the remainder in a dummy element (<x>...</x>), ! * and later extending the XPath expression accordingly. ! */ ! if (len >= 5 && ! xmlStrncmp((xmlChar *) datastr, (xmlChar *) "<?xml", 5) == 0) ! { ! i = 5; ! while (i < len && ! !(datastr[i - 1] == '?' && datastr[i] == '>')) ! i++; ! ! if (i == len) ! xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR, ! "could not parse XML data"); ! ! ++i; ! ! datastr += i; ! len -= i; ! } ! ! memcpy(string, "<x>", 3); ! memcpy(string + 3, datastr, len); ! memcpy(string + 3 + len, "</x>", 5); ! len += 7; ! ! doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0); ! ! if (doc == NULL) ! xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT, ! "could not parse XML data"); ! ! if (*VARDATA(xpath_expr_text) == '/') ! { ! memcpy(xpath_expr, "/x", 2); ! memcpy(xpath_expr + 2, VARDATA(xpath_expr_text), xpath_len); ! xpath_expr[xpath_len + 2] = '\0'; ! xpath_len += 2; ! } ! else ! { ! memcpy(xpath_expr, "/x//", 4); ! memcpy(xpath_expr + 4, VARDATA(xpath_expr_text), xpath_len); ! xpath_expr[xpath_len + 4] = '\0'; ! xpath_len += 4; ! } ! ! } ! else ! { ! /* ! * if we didn't need to mangle the XML, we don't need to mangle the ! * xpath either. ! */ ! memcpy(xpath_expr, VARDATA(xpath_expr_text), xpath_len); ! xpath_expr[xpath_len] = '\0'; ! } ! xpathctx = xmlXPathNewContext(doc); if (xpathctx == NULL) xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
Andrew Dunstan <andrew@dunslane.net> writes: >> I'll do some tests to see what the cost of extra xml parsing might be. > The extra cost appears to be fairly negligible. Uh, you didn't actually show a comparison of before and after? What it looks like to me is that this approach is free or nearly so for well-formed documents, but doubles the parsing cost for forests. Which is likely to annoy anyone who's really depending on the capability. Also, > ! if (*VARDATA(xpath_expr_text) == '/') This is risking a core dump if the xpath expr is of zero length. You need something like if (xpath_len > 0 && *VARDATA(xpath_expr_text) == '/') It would also be a good idea if the allocation of string and xpath_expr had a comment about why it's allocating extra space (something like "see hacks below for use of this extra space" would be sufficient). regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >>> I'll do some tests to see what the cost of extra xml parsing might be. >>> > > >> The extra cost appears to be fairly negligible. >> > > Uh, you didn't actually show a comparison of before and after? > What it looks like to me is that this approach is free or nearly so for > well-formed documents, but doubles the parsing cost for forests. > Which is likely to annoy anyone who's really depending on the > capability. > The difference is lost in the noise. Without fix: Time: 24.619 ms Time: 24.245 ms Time: 25.179 ms With fix: Time: 24.084 ms Time: 21.980 ms Time: 23.765 ms The test is done on 10,000 short fragments each parsed 10 times (or 20 times with the fix). I'll test again on some longer fragments since you don't seem convinced. > Also, > > >> ! if (*VARDATA(xpath_expr_text) == '/') >> > > This is risking a core dump if the xpath expr is of zero length. You > need something like > > if (xpath_len > 0 && *VARDATA(xpath_expr_text) == '/') > OK. > It would also be a good idea if the allocation of string and xpath_expr > had a comment about why it's allocating extra space (something like "see > hacks below for use of this extra space" would be sufficient). > OK. cheers andrew
On Fri, 2009-02-27 at 16:37 -0500, Andrew Dunstan wrote: > > Hannu Krosing wrote: > > On Thu, 2009-02-26 at 13:51 -0500, Robert Haas wrote: > > > >>> What I have proposed for 8.3 should not break a single case that currently > >>> behaves usefully. If anyone has a counter-example please show it. > >>> > >>> What I have proposed for 8.4 possibly would break current "useful" behaviour > >>> (FSVO "useful"), but should be done anyway on correctness grounds. > >>> > >> I dunno, aren't XML document fragments sort of a pretty common case? > >> > > > > I'd rather argue that xml datatype should not even accept anything but > > complete xml documents. Same as int field does not accept int[]. > > > > Or maybe we rather need separate xmldocument and xmlforest/xmlfragments > > types in next releases and leave the "base" xml as it is but deprecated > > due to inability to fix it without breaking backwards compatibility. > > > > > > Some of the functions, including some specified in the standard, produce > fragments. That's why we have the 'IS DOCUMENT' test. But then you could use xmlfragments as the functions return type, no ? Does tha standard require that the same field type must store both documents and fragments ? > You can also force validation as a document by saying SET XML OPTION > DOCUMENT; > > cheers > > andrew -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
On Feb 26, 2009, at 7:03 PM, Andrew Dunstan wrote: > can you point me at any call in libxml2 which will evaluate an xpath > expression in the context of a nodeset instead of a document? No, I can't. node-sets are XPath objects not xmlNode objects, so I don't think it would be as simple as modifying: xml.c:xpath() { ... xpathctx->node = xmlDocGetRootElement(doc); with the result of xmlXPathNewNodeSet.. [snip other questions] My *guess* would be that if we were to use a node-set instead, we'd still have to prefix the XPath query. In this case, with a function call to an xpath extension function that creates the NodeSet from the content fragment(s?) of the document created by xml_parse(ie, more or less, a re-implementation of exsl:node-set() tailored for our use- case). Well, that or force the user to call it explicitly. Possible or not--wrt using a content fragment/document as the context node, I find this less desirable than the current mangling, so I'm becoming quite indifferent. :)
Hannu Krosing wrote: >>> >>> >> Some of the functions, including some specified in the standard, produce >> fragments. That's why we have the 'IS DOCUMENT' test. >> > > But then you could use xmlfragments as the functions return type, no ? > > > Not in the case of xpath, no. There is a very complete standard for the Xpath data model, and we need to adhere to it. cheers andrew
sigh.. I got curious. :P On Feb 27, 2009, at 7:19 PM, James Pye wrote: > Well, that or force the user to call it explicitly. Attached is the patch that I used to get the results below.. This is just a proof of concept, so it's quite lacking. Notably, it doesn't even try to identify well-formed documents. Purpose/idea being, give the user access to the poorly-formed document as a node-set via the "fragment" function instead of mangling the xpath and xml: postgres=# SELECT xpath('fragment()//*', 'bleh<foo/><bar/>'::xml); xpath ------- {} (1 row) postgres=# SELECT xpath('fragment()//*', 'bleh<meh><sub/></meh><foo/ ><bar/>'::xml); xpath ---------- {<sub/>} (1 row) postgres=# SELECT xpath('fragment()/*', 'bleh<meh><sub/></meh><foo/ ><bar/>'::xml); xpath ---------- {<sub/>} (1 row) postgres=# SELECT xpath('fragment()', 'bleh<meh><sub/></meh><foo/><bar/ >'::xml); xpath ------------------------ {bleh,"<meh> <sub/> </meh>",<foo/>,<bar/>} (1 row) postgres=# SELECT xpath('/*', 'bleh<meh><sub/></meh><foo/><bar/>'::xml); xpath ------- {} (1 row) postgres=# SELECT xpath('fragment()[local-name()="foo"]/@att', 'bleh<meh><sub/></meh><foo att="sometin"/><bar/>'::xml); xpath ----------- {sometin} (1 row) postgres=# SELECT xpath('fragment()[local-name()="meh"]/*', 'bleh<meh><sub/></meh><foo att="sometin"/><bar/>'::xml); xpath ---------- {<sub/>} (1 row) postgres=# SELECT xpath('fragment()[local-name()="meh" or local- name()="bar"]', 'bleh<meh><sub/></meh><foo att="sometin"/><bar/>'::xml); xpath ----------------- {"<meh> <sub/> </meh>",<bar/>} (1 row) postgres=# SELECT xpath('fragment()[local-name()="bar"]', 'bleh<meh><sub/></meh><foo att="sometin"/><bar/>'::xml); xpath ---------- {<bar/>} (1 row) postgres=# SELECT xpath('fragment()[@*]', 'bleh<meh><sub/></meh>othertext<foo att="sometin"/><bar/>'::xml); xpath ---------------------------- {"<foo att=\"sometin\"/>"} (1 row) Can't say that I've ever been thrilled with using node-sets, but *shrug*. I'm sleepy now..
Attachment
James Pye wrote: > sigh.. I got curious. :P > > On Feb 27, 2009, at 7:19 PM, James Pye wrote: >> Well, that or force the user to call it explicitly. > > > Attached is the patch that I used to get the results below.. > This is just a proof of concept, so it's quite lacking. Notably, it > doesn't even try to identify well-formed documents. This is entirely out of the question for 8.3, as it's a significant change of behaviour. I'd also want to see this usage blessed by some xpath guru ... I'm not sure it meets the standard's requirements, but I could be wrong. And it seems to me much better to provide the facility as a separate function e.g. xpath_fragment() (if at all) rather than by adding on a non-standard xpath function, but that's just a first impression. Nice piece of work, though. cheers andrew
On Feb 28, 2009, at 7:53 AM, Andrew Dunstan wrote: > This is entirely out of the question for 8.3, as it's a significant > change of behaviour. Yep. Even with implicit prefixing, the semantics are very different. What got me thinking about it was this: http://archives.postgresql.org/pgsql-bugs/2008-07/msg00058.php If it's desirable to avoid prefixing, what options remain? (At least I find it desirable to avoid prefixing =) > I'd also want to see this usage blessed by some xpath guru ... I'm > not sure it meets the standard's requirements, but I could be wrong. Oh, the context node question you raised? I think it would be easy to expect that the standard is expecting a well-formed document to query against in the first place, so I *do* think it's a very valid concern. http://www.w3.org/TR/xpath http://www.w3.org/TR/xpath#data-model http://www.w3.org/TR/xpath#infoset Curious, if we constructed an actual document fragment node from the node list and set it as the document's root, would that be enough to satisfy any requirements? It does appear to talk about nodes quite generally. In the current case, we're shaving the corners of the square peg so it will fit in the round hole. In fragment()'s case, it seems we would be trying to circumvent the round hole altogether.. I don't really like either way. :P
I wrote: > > > > I'll test again on some longer fragments since you don't seem convinced. > > I set up a test with a much larger XML fragment - over 1Mb - basically it's the English source of the SVN Turtle book. The result is that the extra parsing cost is still pretty much unmeasurable: regression=# select avg(length(foo)) from (select repeat(xpath('//title',src)::text,i) as foo from xpathtest4, generate_series(1,100) as i ) x; avg ----------------------1309869.000000000000 (1 row) Without fix: Time: 5695.930 ms Time: 4855.837 ms Time: 5453.044 ms With fix: Time: 5232.810 ms Time: 5272.375 ms Time: 5528.434 ms So I'm going to go ahead and commit this change for 8.3, with Tom's suggested ammendments. cheers andrew
On Fri, 2009-02-27 at 22:55 -0500, Andrew Dunstan wrote: > > Hannu Krosing wrote: > >>> > >>> > >> Some of the functions, including some specified in the standard, produce > >> fragments. That's why we have the 'IS DOCUMENT' test. > >> > > > > But then you could use xmlfragments as the functions return type, no ? > > > > > > > > Not in the case of xpath, no. single xml document is a sub-case of xmlforest, so xmlforest should be allowed as return type, no ? > There is a very complete standard for the Xpath data model, > and we need to adhere to it. Is declaring a single all-covering "xml" data type the best or even the only way to adhere ? > cheers > > andrew > -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
Hannu Krosing wrote: >> Some of the functions, including some specified in the standard, produce >> fragments. That's why we have the 'IS DOCUMENT' test. >> > > But then you could use xmlfragments as the functions return type, no ? > > Does tha standard require that the same field type must store both > documents and fragments ? > > Yes, the standard very explicitly provides for one XML type which need not be an XML document. We have no choice about that. cheers andrew
On Sun, 2009-03-01 at 10:13 -0500, Andrew Dunstan wrote: > > Hannu Krosing wrote: > >> Some of the functions, including some specified in the standard, produce > >> fragments. That's why we have the 'IS DOCUMENT' test. > >> > > > > But then you could use xmlfragments as the functions return type, no ? > > > > Does tha standard require that the same field type must store both > > documents and fragments ? > > > > > > Yes, the standard very explicitly provides for one XML type which need > not be an XML document. We have no choice about that. What is it then ? A sequence of XML elements ? Which standard does postgreSQL XML type need to confirm to - general XML DB, Xpath or some other XML ? XML Path Language (XPath) Version 1.0 -------------------------------------- starts with this Abstract: "XPath is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer." So I think that using Xpath on anything else than "XML document" is invalid and results are undefined. XML 1.0 and XML 1.1 ------------------- Also, both XML 1.0 and XML 1.1 standards are about a thing called an "XML document", so I don't see anything there, which would make us accept anything else as being "XML". And SQL 2008, Part 14: XML-Related Specifications (SQL/XML) ----------------------------------------------------------- Says: "SQL defines a predefined data type named by the following <key word>: XML. ... The data types XML(DOCUMENT(UNTYPED)), XML(DOCUMENT(ANY)), XML(DOCUMENT(XMLSCHEMA)), XML(CONTENT(UNTYPED)), XML(CONTENT(ANY)), XML(CONTENT(XMLSCHEMA)), and XML(SEQUENCE) are referred to as the XML types. Values of XML types are called XML values." So while the type itself could be called XML, there are several subtypes, like Document, Content and Sequence Could the XML(SEQUENCE) better be represented as an array of xml documents aka. xml[] , and maybe CONTENT could be done as xmlelement[] where xmlelement can be any single XML element, including CDATA and plain text ? > cheers > > andrew -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
Hannu Krosing wrote: > On Sun, 2009-03-01 at 10:13 -0500, Andrew Dunstan wrote: > >> Hannu Krosing wrote: >> >>>> Some of the functions, including some specified in the standard, produce >>>> fragments. That's why we have the 'IS DOCUMENT' test. >>>> >>>> >>> But then you could use xmlfragments as the functions return type, no ? >>> >>> Does tha standard require that the same field type must store both >>> documents and fragments ? >>> >>> >>> >> Yes, the standard very explicitly provides for one XML type which need >> not be an XML document. We have no choice about that. >> > > What is it then ? A sequence of XML elements ? > In its typically oblique way, the 2003 draft says this: NOTE 1 — An XML root information item is similar to an XML document information item, with the following modifications:an XML root information item does not have a [document element] property, a [base URI] property, a [characterencoding scheme] property, or an [all declarations processed] property; and the [children] property permitsmore than one XML element information item. An SQL/XML information item is either an XML root information item or one of the following (defined in Subclause 3.1.3,“Definitions provided in Part 14”): an XML attribute information item, an XML character information item, an XMLcomment information item, an XML document type declaration information item, an XML element information item, an XMLnamespace information item, an XML notation information item, an XML processing instruction information item, an XMLunexpanded entity reference information item, or an XML unparsed entity information item. An XML value is either the null value, or a collection of SQL/XML information items, consisting of exactly one XMLroot information item, plus any other SQL/XML information items that can be reached recursively by traversing theproperties of the SQL/XML information items. > Which standard does postgreSQL XML type need to confirm to - general XML > DB, Xpath or some other XML ? > I think the XML type needs to conform to the SQL/XML spec. However, we are trying to apply XPath, which has a different data model, to that type - hence the impedance mismatch. I think that the best we can do (for 8.4, having fixed 8.3 as best we can without adversely changing behaviour) is to throw the responsibility for ensuring that the XML passed to the function is an XML document back on the programmer. Anything else, especially any mangling of the XPath expression, presents a very real danger of breaking on correct input. cheers andrew
Andrew Dunstan wrote: > can you point me at any call in libxml2 which will evaluate an xpath > expression in the context of a nodeset instead of a document? Quite > apart from anything else, xpath requires there to be a (single) context > node (see http://www.w3.org/TR/xpath20/#context ). For a doc, we set > that node to the document node, but what would it be for a node-set or a > fragment? If we can't get over that hurdle we're screwed in pursuing > your line of thought. Which may hint at the fact that running xpath on content fragments is ill-defined to begin with?!?
On Sun, 2009-03-01 at 18:22 -0500, Andrew Dunstan wrote: > I think the XML type needs to conform to the SQL/XML spec. However, we > are trying to apply XPath, which has a different data model, to that > type - hence the impedance mismatch. > > I think that the best we can do (for 8.4, having fixed 8.3 as best we > can without adversely changing behaviour) is to throw the > responsibility > for ensuring that the XML passed to the function is an XML document > back on the programmer. Anything else, especially any mangling of the > XPath > expression, presents a very real danger of breaking on correct input. Can we provide a single function to bridge the gap between fragment and document? It will be clearer to do this than to see various forms of appending/munging, even if that function is a simple wrapper around an append. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Peter Eisentraut wrote: > Andrew Dunstan wrote: >> can you point me at any call in libxml2 which will evaluate an xpath >> expression in the context of a nodeset instead of a document? Quite >> apart from anything else, xpath requires there to be a (single) >> context node (see http://www.w3.org/TR/xpath20/#context ). For a doc, >> we set that node to the document node, but what would it be for a >> node-set or a fragment? If we can't get over that hurdle we're >> screwed in pursuing your line of thought. > > Which may hint at the fact that running xpath on content fragments is > ill-defined to begin with?!? > Right. But that's no excuse for what we have been doing, which was demonstrably providing false results on good input. cheers andrew
Simon Riggs wrote: > On Sun, 2009-03-01 at 18:22 -0500, Andrew Dunstan wrote: > > >> I think the XML type needs to conform to the SQL/XML spec. However, we >> are trying to apply XPath, which has a different data model, to that >> type - hence the impedance mismatch. >> >> I think that the best we can do (for 8.4, having fixed 8.3 as best we >> can without adversely changing behaviour) is to throw the >> responsibility >> for ensuring that the XML passed to the function is an XML document >> back on the programmer. Anything else, especially any mangling of the >> XPath >> expression, presents a very real danger of breaking on correct input. >> > > Can we provide a single function to bridge the gap between fragment and > document? It will be clearer to do this than to see various forms of > appending/munging, even if that function is a simple wrapper around an > append. > > I have no objection to providing an *extra* function that explicitly wraps non-documents and prefixes the xpath expression in that case, and is documented to have limitations. But I don't think we can provide a single function that always "does the right thing", especially when that is so ill-defined in the case of fragments. cheers andrew
On Mon, 2009-03-02 at 07:54 -0500, Andrew Dunstan wrote: > > Simon Riggs wrote: > > On Sun, 2009-03-01 at 18:22 -0500, Andrew Dunstan wrote: > > > > > >> I think the XML type needs to conform to the SQL/XML spec. However, we > >> are trying to apply XPath, which has a different data model, to that > >> type - hence the impedance mismatch. > >> > >> I think that the best we can do (for 8.4, having fixed 8.3 as best we > >> can without adversely changing behaviour) is to throw the > >> responsibility > >> for ensuring that the XML passed to the function is an XML document > >> back on the programmer. Anything else, especially any mangling of the > >> XPath > >> expression, presents a very real danger of breaking on correct input. > >> > > > > Can we provide a single function to bridge the gap between fragment and > > document? It will be clearer to do this than to see various forms of > > appending/munging, even if that function is a simple wrapper around an > > append. > > > > > > I have no objection to providing an *extra* function that explicitly > wraps non-documents and prefixes the xpath expression in that case, and > is documented to have limitations. But I don't think we can provide a > single function that always "does the right thing", especially when that > is so ill-defined in the case of fragments. Is it just that in you _can't_ use Xpath on fragments, and you _need_ to pass full documents to Xpath ? At least this is my reading of Xpath standard. > cheers > > andrew -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
Hannu Krosing wrote: > Is it just that in you _can't_ use Xpath on fragments, and you _need_ to > pass full documents to Xpath ? > > At least this is my reading of Xpath standard. It is easy to read the XPath standard that way, because the concept of fragments is not defined outside of SQL/XML, and is therefore unknown to the XPath standard. The question at hand is rather whether we can usefully adapt it.
Hannu Krosing wrote: > Is it just that in you _can't_ use Xpath on fragments, and you _need_ to > pass full documents to Xpath ? > > At least this is my reading of Xpath standard. > > I think that's possibly overstating it., unless I have missed something (W3 standards are sometimes not much more clear than the SQL standards ;-( ) For instance, there's this, that implies at least that the tree might not be a document: A "/" at the beginning of a path expression is an abbreviation for the initial step fn:root(self::node()) treat as document-node()/ (however, if the "/" is the entire path expression, the trailing "/" is omitted from the expansion.)The effect of this initial step is to begin the path at the root node of the tree that contains the contextnode. If the context item is not a node, a type error is raised [err:XPTY0020]. At evaluation time, if the rootnode above the context node is not a document node, a dynamic error is raised [err:XPDY0050]. The problem is that we certainly do have to provide a context node (the standard is clear about that), and unless we want to convert a non-document to a node-set as James suggested and then apply the xpath expression to each node in the node-set, we have no way of sanely specifying the context node. cheers andrew
On Mon, 2009-03-02 at 15:25 +0200, Peter Eisentraut wrote: > Hannu Krosing wrote: > > Is it just that in you _can't_ use Xpath on fragments, and you _need_ to > > pass full documents to Xpath ? > > > > At least this is my reading of Xpath standard. > > It is easy to read the XPath standard that way, because the concept of > fragments is not defined outside of SQL/XML, and is therefore unknown to > the XPath standard. How is the opposite - Does SQL/XML specify Xpath usage for XML(SEQUENCE) and XML(CONTENT) ? > The question at hand is rather whether we can > usefully adapt it. This sounds like trying to adapt integer arithmetic to lists-of-integers. Even for simple things like addition, there are several ways of doing it [1,2,3] + [1,1,1] = [1,2,3,1,1,1] [1,2,3] + [1,1,1] = [2,3,4] [1,2,3] + [1,1,1] = [[1,2,3],[1,1,1]] all seem possible and "logical" -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
Andrew Dunstan wrote: > > > Hannu Krosing wrote: >> Is it just that in you _can't_ use Xpath on fragments, and you _need_ to >> pass full documents to Xpath ? >> At least this is my reading of Xpath standard. >> >> > > I think that's possibly overstating it., unless I have missed > something (W3 standards are sometimes not much more clear than the SQL > standards ;-( ) > > For instance, there's this, that implies at least that the tree might > not be a document: > > A "/" at the beginning of a path expression is an abbreviation for > the initial step fn:root(self::node()) treat as document-node()/ > (however, if the "/" is the entire path expression, the trailing "/" > is omitted from the expansion.) The effect of this initial step is > to begin the path at the root node of the tree that contains the > context node. If the context item is not a node, a type error is > raised [err:XPTY0020]. At evaluation time, if the root node above > the context node is not a document node, a dynamic error is raised > [err:XPDY0050]. > > The problem is that we certainly do have to provide a context node > (the standard is clear about that), and unless we want to convert a > non-document to a node-set as James suggested and then apply the xpath > expression to each node in the node-set, we have no way of sanely > specifying the context node. > No-one has come up with an answer to this, so I propose to remove the hackery. That leaves the question of what to do when the xml is not a well formed document ... raise an error? cheers andrew
On Mar 20, 2009, at 8:56 AM, Andrew Dunstan wrote: > Andrew Dunstan wrote: >> A "/" at the beginning of a path expression is an abbreviation for >> the initial step fn:root(self::node()) treat as document-node()/ >> (however, if the "/" is the entire path expression, the trailing >> "/" >> is omitted from the expansion.) The effect of this initial step is >> to begin the path at the root node of the tree that contains the >> context node. If the context item is not a node, a type error is >> raised [err:XPTY0020]. At evaluation time, if the root node above >> the context node is not a document node, a dynamic error is raised >> [err:XPDY0050]. >> >> The problem is that we certainly do have to provide a context node >> (the standard is clear about that), and unless we want to convert a >> non-document to a node-set as James suggested and then apply the >> xpath expression to each node in the node-set, we have no way of >> sanely specifying the context node. libxml2 only supports xpath1. Why are you referencing xpath20? And, if SQL-XML requires an xpath20 conforming xpath() function, we have bigger problems than "'/x' + xpath_str". ;) Although, this is not terribly important to me as: > No-one has come up with an answer to this, I don't feel fragment()/node-set() is a good idea from a usability standpoint alone. I only mentioned it because that's how I've always worked with fragments in the xslt1 world.. Mere curiosity drove most of the remaining interest I had in it. > so I propose to remove the hackery. I think this would probably be best for the core xpath() function. However, it may be wise to relocate the munging functionality into another function so users don't have invent their own when they feel so inclined to work with fragments. > raise an error? +1, "xpath requires a well-formed document"