Thread: xpath processing brain dead

xpath processing brain dead

From

Andrew Dunstan

Date:

26 February 2009, 00:08:17

Andrew Gierth was just pointing out to me how badly broken our XPath 
processing is.

For fear of passing an ill formed fragment of xml to the processor, we 
strip the xml declaration if any and surround what's left with '<x>" and 
'</x>' and prepend '/x' to the supposed xpath. This is just horrible. It 
will break for every xpath expression that doesn't begin with a '/' and 
probably for many that do.

This whole thing is a mess, and I suspect the only fix for now is to 
undo all the mangling of both the xml and the xpath expression. If the 
programmer passes an ill formed piece of xml to the processor that is 
their lookout, but I think we should ensure that we give back correct 
results on well formed input.

The only good piece of news is that the xpath procedures in contrib/xml2 
don't apparently suffer these faults.

cheers

andrew

Re: xpath processing brain dead

From

Tom Lane

Date:

26 February 2009, 01:20:29

Andrew Dunstan <andrew@dunslane.net> writes:
> For fear of passing an ill formed fragment of xml to the processor, we 
> strip the xml declaration if any and surround what's left with '<x>" and 
> '</x>' and prepend '/x' to the supposed xpath. This is just horrible.

I seem to recall having complained about that at the time, but I didn't
(and don't) know enough about xpath to do any better.

> This whole thing is a mess, and I suspect the only fix for now is to 
> undo all the mangling of both the xml and the xpath expression.

I don't think we should change the behavior if it's just to arrive at
another less-than-desirable behavior.  Whacking semantics around afresh
with each release does not endear us to users.  If we know how to fix it
right, great; but if we can't then we should keep compatibility with 8.3
until we can.
        regards, tom lane

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

26 February 2009, 09:54:38

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> For fear of passing an ill formed fragment of xml to the processor, we 
>> strip the xml declaration if any and surround what's left with '<x>" and 
>> '</x>' and prepend '/x' to the supposed xpath. This is just horrible.
>>     
>
> I seem to recall having complained about that at the time, but I didn't
> (and don't) know enough about xpath to do any better.
>   

Well, a few of us do. I guess I took my eye off the ball a bit back when 
we were putting this into 8.3.

>   
>> This whole thing is a mess, and I suspect the only fix for now is to 
>> undo all the mangling of both the xml and the xpath expression.
>>     
>
> I don't think we should change the behavior if it's just to arrive at
> another less-than-desirable behavior.  Whacking semantics around afresh
> with each release does not endear us to users.  If we know how to fix it
> right, great; but if we can't then we should keep compatibility with 8.3
> until we can.
>
>             
>   

Honestly, this is a bug, pure and simple. There really can't be an 
argument about that. For the stable branch, we could make the following 
changes that should result in a Pareto improvement (nothing gets worse 
while some things get better):
   * only do the xml transformation if the xml is known not to be be     well formed   * if we need to mangle the xpath
expressiondue to doing the xml     transformation, then unless the xpath expression begins with a     '/', prepend it
with'/x//'. (two slashes corresponds to the     descendent axis in xpath - in effect it stands for any number of
descendentelements).

But that's just a holding operation. For 8.4 we should stop this 
nonsense and simply say that it is up to the programmer to ensure that 
the xml passed to the processor is well formed.

The thing that is so very bad about this is that if the programmer *has* 
made sure that the inputs are correct, s/he can still end up with broken 
results. If we're going to try to fix bad inputs, we must make damn sure 
that we don't break on correct inputs as a result. However, I can't off 
hand think of a lightning way to fix bad inputs that doesn't carry some 
danger to good inputs. Until someone comes up with something tolerably 
bulletproof, I suggest that we simply say that it is the programmer's 
responsibility.

cheers

andrew

Re: xpath processing brain dead

From

Peter Eisentraut

Date:

26 February 2009, 12:38:08

Andrew Dunstan wrote:
> For fear of passing an ill formed fragment of xml to the processor, we 
> strip the xml declaration if any and surround what's left with '<x>" and 
> '</x>' and prepend '/x' to the supposed xpath. This is just horrible. It 
> will break for every xpath expression that doesn't begin with a '/' and 
> probably for many that do.
> 
> This whole thing is a mess, and I suspect the only fix for now is to 
> undo all the mangling of both the xml and the xpath expression. If the 
> programmer passes an ill formed piece of xml to the processor that is 
> their lookout, but I think we should ensure that we give back correct 
> results on well formed input.

It's not about ill-formed pieces, it is about (well-formed) content 
fragments that are not full documents (exactly one root element). 
libxml2 doesn't support xpath on content fragments.

The tradeoff here is either supporting no xpath at all on content 
fragments or supporting some xpath on both kinds of xml data.  Whoever 
made the initial implementation of this (Nikolay, Cc) decided for the 
latter, but I can't say I'm happy with it either.  I'd be OK with 
changing it, but I haven't had time to get to it, unfortunately.

See also <http://wiki.postgresql.org/wiki/XPath_Todo#XPath>.

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

26 February 2009, 13:11:57

Peter Eisentraut wrote:
> Andrew Dunstan wrote:
>> For fear of passing an ill formed fragment of xml to the processor, 
>> we strip the xml declaration if any and surround what's left with 
>> '<x>" and '</x>' and prepend '/x' to the supposed xpath. This is just 
>> horrible. It will break for every xpath expression that doesn't begin 
>> with a '/' and probably for many that do.
>>
>> This whole thing is a mess, and I suspect the only fix for now is to 
>> undo all the mangling of both the xml and the xpath expression. If 
>> the programmer passes an ill formed piece of xml to the processor 
>> that is their lookout, but I think we should ensure that we give back 
>> correct results on well formed input.
>
> It's not about ill-formed pieces, it is about (well-formed) content 
> fragments that are not full documents (exactly one root element). 
> libxml2 doesn't support xpath on content fragments.
>
> The tradeoff here is either supporting no xpath at all on content 
> fragments or supporting some xpath on both kinds of xml data.  Whoever 
> made the initial implementation of this (Nikolay, Cc) decided for the 
> latter, but I can't say I'm happy with it either.  I'd be OK with 
> changing it, but I haven't had time to get to it, unfortunately.
>
> See also <http://wiki.postgresql.org/wiki/XPath_Todo#XPath>.


I don't think it is our responsibility to remedy the deficiencies of 
libxml2, especially if it involves breaking the processing of valid 
xpath expressions on well formed XML.

If someone can come up with a correct scheme for handling fragments, I'm 
all ears, but otherwise I think a) we should rip this out of 8.4 and b) 
we should try to make 8.3 slightly less broken at least, along the lines 
of my earlier suggestion.

cheers

andrew

Re: xpath processing brain dead

From

Tom Lane

Date:

26 February 2009, 14:01:09

Andrew Dunstan <andrew@dunslane.net> writes:
> I don't think it is our responsibility to remedy the deficiencies of 
> libxml2, especially if it involves breaking the processing of valid 
> xpath expressions on well formed XML.

> If someone can come up with a correct scheme for handling fragments, I'm 
> all ears, but otherwise I think a) we should rip this out of 8.4 and b) 
> we should try to make 8.3 slightly less broken at least, along the lines 
> of my earlier suggestion.

I'm okay with removing this for 8.4, but I think it's too late to
change the behavior of 8.3.
        regards, tom lane

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

26 February 2009, 14:10:24

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> I don't think it is our responsibility to remedy the deficiencies of 
>> libxml2, especially if it involves breaking the processing of valid 
>> xpath expressions on well formed XML.
>>     
>
>   
>> If someone can come up with a correct scheme for handling fragments, I'm 
>> all ears, but otherwise I think a) we should rip this out of 8.4 and b) 
>> we should try to make 8.3 slightly less broken at least, along the lines 
>> of my earlier suggestion.
>>     
>
> I'm okay with removing this for 8.4, but I think it's too late to
> change the behavior of 8.3.
>
>             
>   

It's never too late to fix a bug. The current behaviour is just plain 
nonsense ... if your xpath expression is 'foo/bar' it ends up searching 
for '/xfoo/bar' which can't possibly be right. Surely we can at least 
fix that.

cheers

andrew

Re: xpath processing brain dead

From

Tom Lane

Date:

26 February 2009, 14:17:31

Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> I'm okay with removing this for 8.4, but I think it's too late to
>> change the behavior of 8.3.

> It's never too late to fix a bug.

When the proposed fix involves breaking cases that used to behave
usefully, you need a much stronger argument than that.
        regards, tom lane

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

26 February 2009, 14:34:47

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Tom Lane wrote:
>>     
>>> I'm okay with removing this for 8.4, but I think it's too late to
>>> change the behavior of 8.3.
>>>       
>
>   
>> It's never too late to fix a bug.
>>     
>
> When the proposed fix involves breaking cases that used to behave
> usefully, you need a much stronger argument than that.
>   

What I have proposed for 8.3 should not break a single case that 
currently behaves usefully. If anyone has a counter-example please show it.

What I have proposed for 8.4 possibly would break current "useful" 
behaviour (FSVO "useful"), but should be done anyway on correctness grounds.

cheers

andrew

Re: xpath processing brain dead

From

Robert Haas

Date:

26 February 2009, 14:51:25

> What I have proposed for 8.3 should not break a single case that currently
> behaves usefully. If anyone has a counter-example please show it.
>
> What I have proposed for 8.4 possibly would break current "useful" behaviour
> (FSVO "useful"), but should be done anyway on correctness grounds.

I dunno, aren't XML document fragments sort of a pretty common case?

...Robert

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

26 February 2009, 16:18:55

Robert Haas wrote:
>> What I have proposed for 8.3 should not break a single case that currently
>> behaves usefully. If anyone has a counter-example please show it.
>>
>> What I have proposed for 8.4 possibly would break current "useful" behaviour
>> (FSVO "useful"), but should be done anyway on correctness grounds.
>>     
>
> I dunno, aren't XML document fragments sort of a pretty common case?
>
>
>   

They are. So what? How does that contradict either of the statements 
made above?

Programmers using libxml2 are used to handling this problem. Why must 
postgres alone use a totally brain dead and utterly incorrect wrapping 
to solve a problem that everyone else leaves up to the programmer to handle?

People in this thread are not concentrating on the fact that what we do 
now can break correct input. That makes it an unquestionable bug, 
IMNSHO. There seems to be agreement about what to do for 8.4, so we seem 
to be arguing about what to do for 8.3. Are you *really* arguing that 
prepending the xpath expression with '/x' in all cases should be allowed 
to continue? If you are I can only assume your use of xml/xpath is 
probably fairly light.

cheers

andrew

Re: xpath processing brain dead

From

Robert Haas

Date:

26 February 2009, 16:37:33

On Thu, Feb 26, 2009 at 3:18 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> I dunno, aren't XML document fragments sort of a pretty common case?
>
> They are. So what? How does that contradict either of the statements made
> above?
>
> Programmers using libxml2 are used to handling this problem. Why must
> postgres alone use a totally brain dead and utterly incorrect wrapping to
> solve a problem that everyone else leaves up to the programmer to handle?

I can't think of any reason, especially not when you put it that way.  :-)

> People in this thread are not concentrating on the fact that what we do now
> can break correct input. That makes it an unquestionable bug, IMNSHO. There
> seems to be agreement about what to do for 8.4, so we seem to be arguing
> about what to do for 8.3. Are you *really* arguing that prepending the xpath
> expression with '/x' in all cases should be allowed to continue? If you are
> I can only assume your use of xml/xpath is probably fairly light.

Mine is very light indeed.  But the change you're proposing seems like
it could CONCEIVABLY break a working application that counts on
PostgreSQL's particular flavor of misbehavior, and therefore it seems
questionable to put that into a stable branch.  The fact that the
application perhaps should not have been written that way to begin
with is neither here nor there.  We don't want people to be afraid of
upgrading to the latest point release when we fix, say, a backend
crash or a data corruption bug.

...Robert

Re: xpath processing brain dead

From

"Joshua D. Drake"

Date:

26 February 2009, 16:47:34

On Thu, 2009-02-26 at 15:37 -0500, Robert Haas wrote:

> Mine is very light indeed.  But the change you're proposing seems like
> it could CONCEIVABLY break a working application that counts on
> PostgreSQL's particular flavor of misbehavior,

That is what release notes are for.

Joshua D. Drake


-- 
PostgreSQL - XMPP: jdrake@jabber.postgresql.org  Consulting, Development, Support, Training  503-667-4564 -
http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

26 February 2009, 16:54:18

Robert Haas wrote:
> But the change you're proposing seems like
> it could CONCEIVABLY break a working application that counts on
> PostgreSQL's particular flavor of misbehavior, and therefore it seems
> questionable to put that into a stable branch.  The fact that the
> application perhaps should not have been written that way to begin
> with is neither here nor there.  We don't want people to be afraid of
> upgrading to the latest point release when we fix, say, a backend
> crash or a data corruption bug.
>
>
>   

Let me reiterate the changes that I propose, and again challenge you to 
provide a working counter-example if you believe it will break anything 
not currently broken, even cases involving fragments.

First, I propose that we abandon this mangling, if, and only if, the xml 
is in fact a well formed XML document. Since the whole point of the 
mangling is to handle situations where the XML is not a well formed 
document, that seems fairly straight-forward. If this change were to 
upset any user, it must be because they are relying on undisputably 
incorrect results.

Second, I propose that, in the remaining cases, where we do mangle the 
XML, if the xpath expression does not begin with a '/', instead of 
prepending it with '/x/, which can not possibly be correct under any 
circumstance, we prepend it with '/x//' which has some possibility of 
giving correct results.

IOW, these proposals will expand the number of correct results from the 
code, without contributing any new incorrect cases. These changes are 
*very* conservative, as is usual when we fix things on stable branches.

cheers

andrew

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

26 February 2009, 17:00:33

I wrote:
>
> Second, I propose that, in the remaining cases, where we do mangle the 
> XML, if the xpath expression does not begin with a '/', instead of 
> prepending it with '/x/, which can not possibly be correct under any 
> circumstance, we prepend it with '/x//' which has some possibility of 
> giving correct results.
>           ^^^          '/x'


cheers      

andrew

Re: xpath processing brain dead

From

Tom Lane

Date:

26 February 2009, 17:24:21

Andrew Dunstan <andrew@dunslane.net> writes:
> First, I propose that we abandon this mangling, if, and only if, the xml 
> is in fact a well formed XML document. Since the whole point of the 
> mangling is to handle situations where the XML is not a well formed 
> document, that seems fairly straight-forward. If this change were to 
> upset any user, it must be because they are relying on undisputably 
> incorrect results.

> Second, I propose that, in the remaining cases, where we do mangle the 
> XML, if the xpath expression does not begin with a '/', instead of 
> prepending it with '/x/, which can not possibly be correct under any 
> circumstance, we prepend it with '/x//' which has some possibility of 
> giving correct results.

Hmm, does this proposal require adding a test of well-formed-ness to
a code path that doesn't currently have one?  If so, is that likely
to contribute any noticeable slowdown?

I can't offhand see an objection to this other than possible performance
impact.
        regards, tom lane

Re: xpath processing brain dead

From

Robert Haas

Date:

26 February 2009, 18:38:52

On Thu, Feb 26, 2009 at 3:54 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> Robert Haas wrote:
>> But the change you're proposing seems like
>> it could CONCEIVABLY break a working application that counts on
>> PostgreSQL's particular flavor of misbehavior, and therefore it seems
>> questionable to put that into a stable branch.  The fact that the
>> application perhaps should not have been written that way to begin
>> with is neither here nor there.  We don't want people to be afraid of
>> upgrading to the latest point release when we fix, say, a backend
>> crash or a data corruption bug.
>
> Let me reiterate the changes that I propose, and again challenge you to
> provide a working counter-example if you believe it will break anything not
> currently broken, even cases involving fragments.
>
> First, I propose that we abandon this mangling, if, and only if, the xml is
> in fact a well formed XML document. Since the whole point of the mangling is
> to handle situations where the XML is not a well formed document, that seems
> fairly straight-forward. If this change were to upset any user, it must be
> because they are relying on undisputably incorrect results.
>
> Second, I propose that, in the remaining cases, where we do mangle the XML,
> if the xpath expression does not begin with a '/', instead of prepending it
> with '/x/, which can not possibly be correct under any circumstance, we
> prepend it with '/x//' which has some possibility of giving correct results.
>
> IOW, these proposals will expand the number of correct results from the
> code, without contributing any new incorrect cases. These changes are *very*
> conservative, as is usual when we fix things on stable branches.
>
> cheers
>
> andrew

You are right.

Nuff said,

...Robert

Re: xpath processing brain dead

From

James Pye

Date:

26 February 2009, 20:26:19

On Feb 26, 2009, at 9:37 AM, Peter Eisentraut wrote:
> It's not about ill-formed pieces, it is about (well-formed) content  
> fragments that are not full documents (exactly one root element).  
> libxml2 doesn't support xpath on content fragments.

exslt:node-set() to the rescue? Or is that/equivalent functionality  
not easily accessed at the C-level with libxml2?

http://www.exslt.org/exsl/functions/node-set/index.html

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

26 February 2009, 20:41:10

James Pye wrote:
> On Feb 26, 2009, at 9:37 AM, Peter Eisentraut wrote:
>> It's not about ill-formed pieces, it is about (well-formed) content 
>> fragments that are not full documents (exactly one root element). 
>> libxml2 doesn't support xpath on content fragments.
>
> exslt:node-set() to the rescue? Or is that/equivalent functionality 
> not easily accessed at the C-level with libxml2?
>
> http://www.exslt.org/exsl/functions/node-set/index.html

A node-set isn't a document. In any case, this functionality doesn't 
appear to be in libxml2, it's in libxslt according to the reference you 
provided.

cheers

andrew

Re: xpath processing brain dead

From

James Pye

Date:

26 February 2009, 21:43:52

On Feb 26, 2009, at 5:41 PM, Andrew Dunstan wrote:
>> http://www.exslt.org/exsl/functions/node-set/index.html
>
> A node-set isn't a document.

yes.. :)

I guess what I'm saying is that if:

tinman=# SELECT XML'<foo/><bar/>';     xml
-------------- <foo/><bar/>
(1 row)

is considered to be valid per *SQL-XML*, then it should probably be  
treated as a node-set in the context of xpath, not mangled with  
<x>...</x>..

Certainly, I would expect an implicit "node-set() call" long before  
wrapping the fragment in <x>...</x> and prefixing my xpath query.

However, I find the validity of the above, XML'<foo/><bar/>', a bit  
disturbing to begin with. :P

> In any case, this functionality doesn't appear to be in libxml2,
> it's in libxslt according to the reference you provided.

I think that's *just* referencing the list of xslt implementations  
that the extension function is known to be available in.. I doubt that  
means to imply that the function or equivalent functionality is not  
available in libxml2 itself. I'd wager that equivalent functionality  
could be implemented if it weren't directly/already available.. =)

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

26 February 2009, 22:03:59

James Pye wrote:
> On Feb 26, 2009, at 5:41 PM, Andrew Dunstan wrote:
>>> http://www.exslt.org/exsl/functions/node-set/index.html
>>
>> A node-set isn't a document.
>
> yes.. :)
>
> I guess what I'm saying is that if:
>
> tinman=# SELECT XML'<foo/><bar/>';
>      xml
> --------------
>  <foo/><bar/>
> (1 row)
>
> is considered to be valid per *SQL-XML*, then it should probably be 
> treated as a node-set in the context of xpath, not mangled with 
> <x>...</x>..
>
> Certainly, I would expect an implicit "node-set() call" long before 
> wrapping the fragment in <x>...</x> and prefixing my xpath query.
>
> However, I find the validity of the above, XML'<foo/><bar/>', a bit 
> disturbing to begin with. :P
>
>> In any case, this functionality doesn't appear to be in libxml2,
>> it's in libxslt according to the reference you provided.
>
> I think that's *just* referencing the list of xslt implementations 
> that the extension function is known to be available in.. I doubt that 
> means to imply that the function or equivalent functionality is not 
> available in libxml2 itself. I'd wager that equivalent functionality 
> could be implemented if it weren't directly/already available.. =)
>

James,

can you point me at any call in libxml2 which will evaluate an xpath 
expression in the context of a nodeset instead of a document? Quite 
apart from anything else, xpath requires there to be a (single) context 
node (see http://www.w3.org/TR/xpath20/#context ). For a doc, we set 
that node to the document node, but what would it be for a node-set or a 
fragment? If we can't get over that hurdle we're screwed in pursuing 
your line of thought.

cheers

andrew

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

26 February 2009, 22:13:35

Tom Lane wrote:
> Hmm, does this proposal require adding a test of well-formed-ness to
> a code path that doesn't currently have one?  If so, is that likely
> to contribute any noticeable slowdown?
>
> I can't offhand see an objection to this other than possible performance
> impact.
>
>             
>   

Yeah, testing the well-formedness might cost a bit. We could 
short-circuit the test by applying some comparatively fast heuristic tests.

Or we could decide that we'll just fix the xpath prefix part for 8.3 and 
keep the wrapping. I don't want to spend a huge effort on fixing 
something I regard as fundamentally broken.

I'll do some tests to see what the cost of extra xml parsing might be.

cheers

andrew

We

Re: xpath processing brain dead

From

Hannu Krosing

Date:

27 February 2009, 16:58:43

On Thu, 2009-02-26 at 13:51 -0500, Robert Haas wrote:
> > What I have proposed for 8.3 should not break a single case that currently
> > behaves usefully. If anyone has a counter-example please show it.
> >
> > What I have proposed for 8.4 possibly would break current "useful" behaviour
> > (FSVO "useful"), but should be done anyway on correctness grounds.
> 
> I dunno, aren't XML document fragments sort of a pretty common case?

I'd rather argue that xml datatype should not even accept anything but
complete xml documents. Same as int field does not accept int[].

Or maybe we rather need separate xmldocument and xmlforest/xmlfragments
types in next releases and leave the "base" xml as it is but deprecated
due to inability to fix it without breaking backwards compatibility.

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability   Services, Consulting and Training

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

27 February 2009, 17:37:23

Hannu Krosing wrote:
> On Thu, 2009-02-26 at 13:51 -0500, Robert Haas wrote:
>   
>>> What I have proposed for 8.3 should not break a single case that currently
>>> behaves usefully. If anyone has a counter-example please show it.
>>>
>>> What I have proposed for 8.4 possibly would break current "useful" behaviour
>>> (FSVO "useful"), but should be done anyway on correctness grounds.
>>>       
>> I dunno, aren't XML document fragments sort of a pretty common case?
>>     
>
> I'd rather argue that xml datatype should not even accept anything but
> complete xml documents. Same as int field does not accept int[].
>
> Or maybe we rather need separate xmldocument and xmlforest/xmlfragments
> types in next releases and leave the "base" xml as it is but deprecated
> due to inability to fix it without breaking backwards compatibility.
>
>   

Some of the functions, including some specified in the standard, produce 
fragments. That's why we have the 'IS DOCUMENT' test.

You can also force validation as a document by saying  SET XML OPTION 
DOCUMENT;

cheers

andrew

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

27 February 2009, 17:45:40

Andrew Dunstan wrote:
>
>
> Tom Lane wrote:
>> Hmm, does this proposal require adding a test of well-formed-ness to
>> a code path that doesn't currently have one?  If so, is that likely
>> to contribute any noticeable slowdown?
>>
>> I can't offhand see an objection to this other than possible performance
>> impact.
>>
>>
>>
>
> Yeah, testing the well-formedness might cost a bit. We could
> short-circuit the test by applying some comparatively fast heuristic
> tests.
>
> Or we could decide that we'll just fix the xpath prefix part for 8.3
> and keep the wrapping. I don't want to spend a huge effort on fixing
> something I regard as fundamentally broken.
>
> I'll do some tests to see what the cost of extra xml parsing might be.


The extra cost appears to be fairly negligible.

regression=# create table xpathtest3 as select xmlconcat(xmlelement(name
unique1, unique1), '\n\t',xmlelement(name unique2, unique2),
'\n\t',xmlelement(name two, two), '\n\t',xmlelement(name four,
four),'\n\t',xmlelement(name ten,ten),'\n\t',xmlelement(name
twenty,twenty),'\n\t',xmlelement(name
hundred,hundred),'\n\t',xmlelement(name
thousand,thousand),'\n\t',xmlelement(name
twothusand,twothousand),'\n\t',xmlelement(name
fivethous,fivethous),'\n\t',xmlelement(name
tenthous,tenthous),'\n\t',xmlelement(name
odd,odd),'\n\t',xmlelement(name even,even),'\n\t',xmlelement(name
stringu1,stringu1),'\n\t',xmlelement(name
stringu2,stringu2),'\n\t',xmlelement(name string4,string4),'\n') from tenk1;

regression=# select count(*) from (select
xpath('//two[text()="0"]/text()',xmlconcat) as elems from xpathtest3,
generate_series(1,10) ) x ;
 count
--------
 100000
(1 row)

Time: 27.722 ms


Proposed patch for 8.3 attached. (Note: it only reparses in the
non-document case)

cheers

andrew


Index: src/backend/utils/adt/xml.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/xml.c,v
retrieving revision 1.68.2.6
diff -c -r1.68.2.6 xml.c
*** src/backend/utils/adt/xml.c    10 Nov 2008 18:02:27 -0000    1.68.2.6
--- src/backend/utils/adt/xml.c    27 Feb 2009 20:59:28 -0000
***************
*** 3320,3360 ****

      xml_init();

!     /*
!      * To handle both documents and fragments, regardless of the fact whether
!      * the XML datum has a single root (XML well-formedness), we wrap the XML
!      * datum in a dummy element (<x>...</x>) and extend the XPath expression
!      * accordingly.  To do it, throw away the XML prolog, if any.
!      */
!     if (len >= 5 &&
!         xmlStrncmp((xmlChar *) datastr, (xmlChar *) "<?xml", 5) == 0)
!     {
!         i = 5;
!         while (i < len &&
!                !(datastr[i - 1] == '?' && datastr[i] == '>'))
!             i++;
!
!         if (i == len)
!             xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
!                         "could not parse XML data");

!         ++i;

!         datastr += i;
!         len -= i;
!     }
!
!     string = (xmlChar *) palloc((len + 8) * sizeof(xmlChar));
!     memcpy(string, "<x>", 3);
!     memcpy(string + 3, datastr, len);
!     memcpy(string + 3 + len, "</x>", 5);
!     len += 7;
!
!     xpath_expr = (xmlChar *) palloc((xpath_len + 3) * sizeof(xmlChar));
!     memcpy(xpath_expr, "/x", 2);
!     memcpy(xpath_expr + 2, VARDATA(xpath_expr_text), xpath_len);
!     xpath_expr[xpath_len + 2] = '\0';
!     xpath_len += 2;

      xmlInitParser();

--- 3320,3332 ----

      xml_init();

!     string = (xmlChar *) palloc((len + 8) * sizeof(xmlChar));

!     xpath_expr = (xmlChar *) palloc((xpath_len + 5) * sizeof(xmlChar));

!     memcpy (string, datastr, len);
!     string[len] = '\0';
!

      xmlInitParser();

***************
*** 3367,3375 ****
          xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
                      "could not allocate parser context");
      doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
!     if (doc == NULL)
!         xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT,
!                     "could not parse XML data");
      xpathctx = xmlXPathNewContext(doc);
      if (xpathctx == NULL)
          xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
--- 3339,3410 ----
          xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
                      "could not allocate parser context");
      doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
!
!     if (doc == NULL || xmlDocGetRootElement(doc) == NULL)
!     {
!
!         /*
!          * In case we have a fragment rather than a well-formed XML document,
!          * which has a single root (XML well-formedness), we try again after
!          * transforming the xml by stripping away the XML prolog, if any, and
!          * wrapping the remainder in a dummy element (<x>...</x>),
!          * and later extending the XPath expression accordingly.
!          */
!         if (len >= 5 &&
!             xmlStrncmp((xmlChar *) datastr, (xmlChar *) "<?xml", 5) == 0)
!         {
!             i = 5;
!             while (i < len &&
!                    !(datastr[i - 1] == '?' && datastr[i] == '>'))
!                 i++;
!
!             if (i == len)
!                 xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
!                             "could not parse XML data");
!
!             ++i;
!
!             datastr += i;
!             len -= i;
!         }
!
!         memcpy(string, "<x>", 3);
!         memcpy(string + 3, datastr, len);
!         memcpy(string + 3 + len, "</x>", 5);
!         len += 7;
!
!         doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
!
!          if (doc == NULL)
!             xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT,
!                         "could not parse XML data");
!
!         if (*VARDATA(xpath_expr_text) == '/')
!         {
!             memcpy(xpath_expr, "/x", 2);
!             memcpy(xpath_expr + 2, VARDATA(xpath_expr_text), xpath_len);
!             xpath_expr[xpath_len + 2] = '\0';
!             xpath_len += 2;
!         }
!         else
!         {
!             memcpy(xpath_expr, "/x//", 4);
!             memcpy(xpath_expr + 4, VARDATA(xpath_expr_text), xpath_len);
!             xpath_expr[xpath_len + 4] = '\0';
!             xpath_len += 4;
!         }
!
!     }
!     else
!     {
!         /*
!          * if we didn't need to mangle the XML, we don't need to mangle the
!          * xpath either.
!          */
!         memcpy(xpath_expr, VARDATA(xpath_expr_text), xpath_len);
!         xpath_expr[xpath_len] = '\0';
!     }
!
      xpathctx = xmlXPathNewContext(doc);
      if (xpathctx == NULL)
          xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,

Re: xpath processing brain dead

From

Tom Lane

Date:

27 February 2009, 18:03:10

Andrew Dunstan <andrew@dunslane.net> writes:
>> I'll do some tests to see what the cost of extra xml parsing might be.

> The extra cost appears to be fairly negligible.

Uh, you didn't actually show a comparison of before and after?
What it looks like to me is that this approach is free or nearly so for
well-formed documents, but doubles the parsing cost for forests.
Which is likely to annoy anyone who's really depending on the
capability.

Also,

> !         if (*VARDATA(xpath_expr_text) == '/')

This is risking a core dump if the xpath expr is of zero length.  You
need something like
    if (xpath_len > 0 && *VARDATA(xpath_expr_text) == '/')

It would also be a good idea if the allocation of string and xpath_expr
had a comment about why it's allocating extra space (something like "see
hacks below for use of this extra space" would be sufficient).
        regards, tom lane

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

27 February 2009, 18:29:37

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>>> I'll do some tests to see what the cost of extra xml parsing might be.
>>>       
>
>   
>> The extra cost appears to be fairly negligible.
>>     
>
> Uh, you didn't actually show a comparison of before and after?
> What it looks like to me is that this approach is free or nearly so for
> well-formed documents, but doubles the parsing cost for forests.
> Which is likely to annoy anyone who's really depending on the
> capability.
>   


The difference is lost in the noise.

Without fix:

Time: 24.619 ms
Time: 24.245 ms
Time: 25.179 ms

With fix:

Time: 24.084 ms
Time: 21.980 ms
Time: 23.765 ms

The test is done on 10,000 short fragments each parsed 10 times (or 20 
times with the fix).

I'll test again on some longer fragments since you don't seem convinced.

> Also,
>
>   
>> !         if (*VARDATA(xpath_expr_text) == '/')
>>     
>
> This is risking a core dump if the xpath expr is of zero length.  You
> need something like
>
>         if (xpath_len > 0 && *VARDATA(xpath_expr_text) == '/')
>   

OK.

> It would also be a good idea if the allocation of string and xpath_expr
> had a comment about why it's allocating extra space (something like "see
> hacks below for use of this extra space" would be sufficient).
>   


OK.

cheers

andrew

Re: xpath processing brain dead

From

Hannu Krosing

Date:

27 February 2009, 19:38:08

On Fri, 2009-02-27 at 16:37 -0500, Andrew Dunstan wrote:
> 
> Hannu Krosing wrote:
> > On Thu, 2009-02-26 at 13:51 -0500, Robert Haas wrote:
> >   
> >>> What I have proposed for 8.3 should not break a single case that currently
> >>> behaves usefully. If anyone has a counter-example please show it.
> >>>
> >>> What I have proposed for 8.4 possibly would break current "useful" behaviour
> >>> (FSVO "useful"), but should be done anyway on correctness grounds.
> >>>       
> >> I dunno, aren't XML document fragments sort of a pretty common case?
> >>     
> >
> > I'd rather argue that xml datatype should not even accept anything but
> > complete xml documents. Same as int field does not accept int[].
> >
> > Or maybe we rather need separate xmldocument and xmlforest/xmlfragments
> > types in next releases and leave the "base" xml as it is but deprecated
> > due to inability to fix it without breaking backwards compatibility.
> >
> >   
> 
> Some of the functions, including some specified in the standard, produce 
> fragments. That's why we have the 'IS DOCUMENT' test.

But then you could use xmlfragments as the functions return type, no ?

Does tha standard require that the same field type must store both
documents and fragments ?

> You can also force validation as a document by saying  SET XML OPTION 
> DOCUMENT;
> 
> cheers
> 
> andrew
-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability   Services, Consulting and Training

Re: xpath processing brain dead

From

James Pye

Date:

27 February 2009, 22:26:50

On Feb 26, 2009, at 7:03 PM, Andrew Dunstan wrote:
> can you point me at any call in libxml2 which will evaluate an xpath  
> expression in the context of a nodeset instead of a document?

No, I can't. node-sets are XPath objects not xmlNode objects, so I  
don't think it would be as simple as modifying:

xml.c:xpath() {   ...   xpathctx->node = xmlDocGetRootElement(doc);

with the result of xmlXPathNewNodeSet..

[snip other questions]

My *guess* would be that if we were to use a node-set instead, we'd  
still have to prefix the XPath query. In this case, with a function  
call to an xpath extension function that creates the NodeSet from the  
content fragment(s?) of the document created by xml_parse(ie, more or  
less, a re-implementation of exsl:node-set() tailored for our use- 
case). Well, that or force the user to call it explicitly. Possible or  
not--wrt using a content fragment/document as the context node, I find  
this less desirable than the current mangling, so I'm becoming quite  
indifferent. :)

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

27 February 2009, 23:55:38

Hannu Krosing wrote:
>>>   
>>>       
>> Some of the functions, including some specified in the standard, produce 
>> fragments. That's why we have the 'IS DOCUMENT' test.
>>     
>
> But then you could use xmlfragments as the functions return type, no ?
>
>   
>

Not in the case of xpath, no.

There is a very complete standard for the Xpath data model, and we need 
to adhere to it.

cheers

andrew

Re: xpath processing brain dead

From

James Pye

Date:

28 February 2009, 06:58:37

sigh.. I got curious. :P

On Feb 27, 2009, at 7:19 PM, James Pye wrote:
> Well, that or force the user to call it explicitly.


Attached is the patch that I used to get the results below..
This is just a proof of concept, so it's quite lacking. Notably, it
doesn't even try to identify well-formed documents.


Purpose/idea being, give the user access to the poorly-formed document
as a node-set via the "fragment" function instead of mangling the
xpath and xml:


postgres=# SELECT xpath('fragment()//*', 'bleh<foo/><bar/>'::xml);
  xpath
-------
  {}
(1 row)

postgres=# SELECT xpath('fragment()//*', 'bleh<meh><sub/></meh><foo/
 ><bar/>'::xml);
   xpath
----------
  {<sub/>}
(1 row)

postgres=# SELECT xpath('fragment()/*', 'bleh<meh><sub/></meh><foo/
 ><bar/>'::xml);
   xpath
----------
  {<sub/>}
(1 row)

postgres=# SELECT xpath('fragment()', 'bleh<meh><sub/></meh><foo/><bar/
 >'::xml);
          xpath
------------------------
  {bleh,"<meh>
    <sub/>
  </meh>",<foo/>,<bar/>}
(1 row)

postgres=# SELECT xpath('/*', 'bleh<meh><sub/></meh><foo/><bar/>'::xml);
  xpath
-------
  {}
(1 row)

postgres=# SELECT xpath('fragment()[local-name()="foo"]/@att',
  'bleh<meh><sub/></meh><foo att="sometin"/><bar/>'::xml);
    xpath
-----------
  {sometin}
(1 row)

postgres=# SELECT xpath('fragment()[local-name()="meh"]/*',
'bleh<meh><sub/></meh><foo att="sometin"/><bar/>'::xml);
   xpath
----------
  {<sub/>}
(1 row)

postgres=# SELECT xpath('fragment()[local-name()="meh" or local-
name()="bar"]', 'bleh<meh><sub/></meh><foo att="sometin"/><bar/>'::xml);
       xpath
-----------------
  {"<meh>
    <sub/>
  </meh>",<bar/>}
(1 row)

postgres=# SELECT xpath('fragment()[local-name()="bar"]',
'bleh<meh><sub/></meh><foo att="sometin"/><bar/>'::xml);
   xpath
----------
  {<bar/>}
(1 row)

postgres=# SELECT xpath('fragment()[@*]',
'bleh<meh><sub/></meh>othertext<foo att="sometin"/><bar/>'::xml);
            xpath
----------------------------
  {"<foo att=\"sometin\"/>"}
(1 row)


Can't say that I've ever been thrilled with using node-sets, but
*shrug*.

I'm sleepy now..

Attachment

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

28 February 2009, 10:53:57

James Pye wrote:
> sigh.. I got curious. :P
>
> On Feb 27, 2009, at 7:19 PM, James Pye wrote:
>> Well, that or force the user to call it explicitly.
>
>
> Attached is the patch that I used to get the results below..
> This is just a proof of concept, so it's quite lacking. Notably, it 
> doesn't even try to identify well-formed documents.

This is entirely out of the question for 8.3, as it's a significant 
change of behaviour.

I'd also want to see this usage blessed by some xpath guru ... I'm not 
sure it meets the standard's requirements, but I could be wrong.

And it seems to me much better to provide the facility as a separate 
function e.g. xpath_fragment() (if at all) rather than by adding on a 
non-standard xpath function, but that's just a first impression.

Nice piece of work, though.

cheers

andrew

Re: xpath processing brain dead

From

James Pye

Date:

28 February 2009, 14:19:30

On Feb 28, 2009, at 7:53 AM, Andrew Dunstan wrote:
> This is entirely out of the question for 8.3, as it's a significant  
> change of behaviour.

Yep. Even with implicit prefixing, the semantics are very different.

What got me thinking about it was this:
 http://archives.postgresql.org/pgsql-bugs/2008-07/msg00058.php

If it's desirable to avoid prefixing, what options remain?

(At least I find it desirable to avoid prefixing =)

> I'd also want to see this usage blessed by some xpath guru ... I'm  
> not sure it meets the standard's requirements, but I could be wrong.

Oh, the context node question you raised? I think it would be easy to  
expect that the standard is expecting a well-formed document to query  
against in the first place, so I *do* think it's a very valid concern.

http://www.w3.org/TR/xpath
http://www.w3.org/TR/xpath#data-model
http://www.w3.org/TR/xpath#infoset

Curious, if we constructed an actual document fragment node from the  
node list and set it as the document's root, would that be enough to  
satisfy any requirements? It does appear to talk about nodes quite  
generally.

In the current case, we're shaving the corners of the square peg so it  
will fit in the round hole. In fragment()'s case, it seems we would be  
trying to circumvent the round hole altogether..

I don't really like either way. :P

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

28 February 2009, 14:33:15

I wrote:
>
>
>
> I'll test again on some longer fragments since you don't seem convinced.
>
>

I set up a test with a much larger XML fragment - over 1Mb - basically 
it's the English source of the SVN Turtle book.

The result is that the extra parsing cost is still pretty much unmeasurable:

regression=# select avg(length(foo)) from (select 
repeat(xpath('//title',src)::text,i) as foo from xpathtest4, 
generate_series(1,100) as i ) x;        avg         
----------------------1309869.000000000000
(1 row)

Without fix:
Time: 5695.930 ms
Time: 4855.837 ms
Time: 5453.044 ms

With fix:
Time: 5232.810 ms
Time: 5272.375 ms
Time: 5528.434 ms

So I'm going to go ahead and commit this change for 8.3, with Tom's 
suggested ammendments.

cheers

andrew

Re: xpath processing brain dead

From

Hannu Krosing

Date:

28 February 2009, 16:38:23

On Fri, 2009-02-27 at 22:55 -0500, Andrew Dunstan wrote:
> 
> Hannu Krosing wrote:
> >>>   
> >>>       
> >> Some of the functions, including some specified in the standard, produce 
> >> fragments. That's why we have the 'IS DOCUMENT' test.
> >>     
> >
> > But then you could use xmlfragments as the functions return type, no ?
> >
> >   
> >
> 
> Not in the case of xpath, no.

single xml document is a sub-case of xmlforest, so xmlforest should be
allowed as return type, no ?

> There is a very complete standard for the Xpath data model, 
> and we need to adhere to it.

Is declaring a single all-covering "xml" data type the best or even the
only way to adhere ?

> cheers
> 
> andrew
> 
-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability   Services, Consulting and Training

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

01 March 2009, 11:13:20

Hannu Krosing wrote:
>> Some of the functions, including some specified in the standard, produce 
>> fragments. That's why we have the 'IS DOCUMENT' test.
>>     
>
> But then you could use xmlfragments as the functions return type, no ?
>
> Does tha standard require that the same field type must store both
> documents and fragments ?
>
>   

Yes, the standard very explicitly provides for one XML type which need 
not be an XML document. We have no choice about that.

cheers

andrew

Re: xpath processing brain dead

From

Hannu Krosing

Date:

01 March 2009, 18:44:56

On Sun, 2009-03-01 at 10:13 -0500, Andrew Dunstan wrote:
> 
> Hannu Krosing wrote:
> >> Some of the functions, including some specified in the standard, produce 
> >> fragments. That's why we have the 'IS DOCUMENT' test.
> >>     
> >
> > But then you could use xmlfragments as the functions return type, no ?
> >
> > Does tha standard require that the same field type must store both
> > documents and fragments ?
> >
> >   
> 
> Yes, the standard very explicitly provides for one XML type which need 
> not be an XML document. We have no choice about that.

What is it then ? A sequence of XML elements ?

Which standard does postgreSQL XML type need to confirm to - general XML
DB, Xpath or some other XML ?

XML Path Language (XPath) Version 1.0
--------------------------------------

starts with this Abstract:

"XPath is a language for addressing parts of an XML document, designed
to be used by both XSLT and XPointer."

So I think that using Xpath on anything else than "XML document" is
invalid and results are undefined.

XML 1.0 and XML 1.1
-------------------

Also, both XML 1.0 and XML 1.1 standards are about a thing called an
"XML document", so I don't see anything there, which would make us
accept anything else as being "XML".

And SQL 2008, Part 14: XML-Related Specifications (SQL/XML)
-----------------------------------------------------------
Says:

"SQL defines a predefined data type named by the following <key word>:
XML.
...
The data types XML(DOCUMENT(UNTYPED)), XML(DOCUMENT(ANY)),
XML(DOCUMENT(XMLSCHEMA)), XML(CONTENT(UNTYPED)), XML(CONTENT(ANY)),
XML(CONTENT(XMLSCHEMA)), and XML(SEQUENCE) are referred to as the XML
types. Values of XML types are called XML values."

So while the type itself could be called XML, there are several
subtypes, like Document, Content and Sequence

Could the XML(SEQUENCE) better be represented as an array 
of xml documents aka. xml[] , and maybe CONTENT could be done as 
xmlelement[] where xmlelement can be any single XML element, including
CDATA and plain text ?

> cheers
> 
> andrew
-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability   Services, Consulting and Training

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

01 March 2009, 19:22:54

Hannu Krosing wrote:
> On Sun, 2009-03-01 at 10:13 -0500, Andrew Dunstan wrote:
>   
>> Hannu Krosing wrote:
>>     
>>>> Some of the functions, including some specified in the standard, produce 
>>>> fragments. That's why we have the 'IS DOCUMENT' test.
>>>>     
>>>>         
>>> But then you could use xmlfragments as the functions return type, no ?
>>>
>>> Does tha standard require that the same field type must store both
>>> documents and fragments ?
>>>
>>>   
>>>       
>> Yes, the standard very explicitly provides for one XML type which need 
>> not be an XML document. We have no choice about that.
>>     
>
> What is it then ? A sequence of XML elements ?
>   

In its typically oblique way, the 2003 draft says this:
   NOTE 1 — An XML root information item is similar to an XML document   information item, with the following
modifications:an   XML root information item does not have a [document element]   property, a [base URI] property, a
[characterencoding scheme] property,   or an [all declarations processed] property; and the [children]   property
permitsmore than one XML element information item.

   An SQL/XML information item is either an XML root information item   or one of the following (defined in   Subclause
3.1.3,“Definitions provided in Part 14”): an XML   attribute information item, an XML character   information item, an
XMLcomment information item, an XML document   type declaration information item,   an XML element information item, an
XMLnamespace information item,   an XML notation information item,   an XML processing instruction information item, an
XMLunexpanded   entity reference information item, or   an XML unparsed entity information item.

   An XML value is either the null value, or a collection of SQL/XML   information items, consisting of exactly   one
XMLroot information item, plus any other SQL/XML information   items that can be reached recursively   by traversing
theproperties of the SQL/XML information items.

> Which standard does postgreSQL XML type need to confirm to - general XML
> DB, Xpath or some other XML ?
>   

I think the XML type needs to conform to the SQL/XML spec. However, we 
are trying to apply XPath, which has a different data model, to that 
type - hence the impedance mismatch.

I think that the best we can do (for 8.4, having fixed 8.3 as best we 
can without adversely changing behaviour) is to throw the responsibility 
for ensuring that the XML passed to the function is an XML document back 
on the programmer. Anything else, especially any mangling of the XPath 
expression, presents a very real danger of breaking on correct input.

cheers

andrew

Re: xpath processing brain dead

From

Peter Eisentraut

Date:

02 March 2009, 04:26:44

Andrew Dunstan wrote:
> can you point me at any call in libxml2 which will evaluate an xpath 
> expression in the context of a nodeset instead of a document? Quite 
> apart from anything else, xpath requires there to be a (single) context 
> node (see http://www.w3.org/TR/xpath20/#context ). For a doc, we set 
> that node to the document node, but what would it be for a node-set or a 
> fragment? If we can't get over that hurdle we're screwed in pursuing 
> your line of thought.

Which may hint at the fact that running xpath on content fragments is 
ill-defined to begin with?!?

Re: xpath processing brain dead

From

Simon Riggs

Date:

02 March 2009, 05:02:02

On Sun, 2009-03-01 at 18:22 -0500, Andrew Dunstan wrote:

> I think the XML type needs to conform to the SQL/XML spec. However, we
> are trying to apply XPath, which has a different data model, to that 
> type - hence the impedance mismatch.
> 
> I think that the best we can do (for 8.4, having fixed 8.3 as best we 
> can without adversely changing behaviour) is to throw the
> responsibility 
> for ensuring that the XML passed to the function is an XML document
> back on the programmer. Anything else, especially any mangling of the
> XPath 
> expression, presents a very real danger of breaking on correct input.

Can we provide a single function to bridge the gap between fragment and
document? It will be clearer to do this than to see various forms of
appending/munging, even if that function is a simple wrapper around an
append.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

02 March 2009, 08:48:35

Peter Eisentraut wrote:
> Andrew Dunstan wrote:
>> can you point me at any call in libxml2 which will evaluate an xpath 
>> expression in the context of a nodeset instead of a document? Quite 
>> apart from anything else, xpath requires there to be a (single) 
>> context node (see http://www.w3.org/TR/xpath20/#context ). For a doc, 
>> we set that node to the document node, but what would it be for a 
>> node-set or a fragment? If we can't get over that hurdle we're 
>> screwed in pursuing your line of thought.
>
> Which may hint at the fact that running xpath on content fragments is 
> ill-defined to begin with?!?
>

Right. But that's no excuse for what we have been doing, which was 
demonstrably providing false results on good input.

cheers

andrew

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

02 March 2009, 08:54:23

Simon Riggs wrote:
> On Sun, 2009-03-01 at 18:22 -0500, Andrew Dunstan wrote:
>
>   
>> I think the XML type needs to conform to the SQL/XML spec. However, we
>> are trying to apply XPath, which has a different data model, to that 
>> type - hence the impedance mismatch.
>>
>> I think that the best we can do (for 8.4, having fixed 8.3 as best we 
>> can without adversely changing behaviour) is to throw the
>> responsibility 
>> for ensuring that the XML passed to the function is an XML document
>> back on the programmer. Anything else, especially any mangling of the
>> XPath 
>> expression, presents a very real danger of breaking on correct input.
>>     
>
> Can we provide a single function to bridge the gap between fragment and
> document? It will be clearer to do this than to see various forms of
> appending/munging, even if that function is a simple wrapper around an
> append.
>
>   

I have no objection to providing an *extra* function that explicitly 
wraps non-documents and prefixes the xpath expression in that case, and 
is documented to have limitations. But I don't think we can provide a 
single function that always "does the right thing", especially when that 
is so ill-defined in the case of fragments.

cheers

andrew

Re: xpath processing brain dead

From

Hannu Krosing

Date:

02 March 2009, 09:15:19

On Mon, 2009-03-02 at 07:54 -0500, Andrew Dunstan wrote:
> 
> Simon Riggs wrote:
> > On Sun, 2009-03-01 at 18:22 -0500, Andrew Dunstan wrote:
> >
> >   
> >> I think the XML type needs to conform to the SQL/XML spec. However, we
> >> are trying to apply XPath, which has a different data model, to that 
> >> type - hence the impedance mismatch.
> >>
> >> I think that the best we can do (for 8.4, having fixed 8.3 as best we 
> >> can without adversely changing behaviour) is to throw the
> >> responsibility 
> >> for ensuring that the XML passed to the function is an XML document
> >> back on the programmer. Anything else, especially any mangling of the
> >> XPath 
> >> expression, presents a very real danger of breaking on correct input.
> >>     
> >
> > Can we provide a single function to bridge the gap between fragment and
> > document? It will be clearer to do this than to see various forms of
> > appending/munging, even if that function is a simple wrapper around an
> > append.
> >
> >   
> 
> I have no objection to providing an *extra* function that explicitly 
> wraps non-documents and prefixes the xpath expression in that case, and 
> is documented to have limitations. But I don't think we can provide a 
> single function that always "does the right thing", especially when that 
> is so ill-defined in the case of fragments.

Is it just that in you _can't_ use Xpath on fragments, and you _need_ to
pass full documents to Xpath ? 

At least this is my reading of Xpath standard.

> cheers
> 
> andrew
-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability   Services, Consulting and Training

Re: xpath processing brain dead

From

Peter Eisentraut

Date:

02 March 2009, 09:25:46

Hannu Krosing wrote:
> Is it just that in you _can't_ use Xpath on fragments, and you _need_ to
> pass full documents to Xpath ? 
> 
> At least this is my reading of Xpath standard.

It is easy to read the XPath standard that way, because the concept of 
fragments is not defined outside of SQL/XML, and is therefore unknown to 
the XPath standard.  The question at hand is rather whether we can 
usefully adapt it.

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

02 March 2009, 09:38:22

Hannu Krosing wrote:
> Is it just that in you _can't_ use Xpath on fragments, and you _need_ to
> pass full documents to Xpath ? 
>
> At least this is my reading of Xpath standard.
>
>   

I think that's possibly overstating it., unless I have missed something 
(W3 standards are sometimes not much more clear than the SQL standards ;-( )

For instance, there's this, that implies at least that the tree might 
not be a document:
   A "/" at the beginning of a path expression is an abbreviation for   the initial step fn:root(self::node()) treat as
document-node()/  (however, if the "/" is the entire path expression, the trailing "/"   is omitted from the
expansion.)The effect of this initial step is   to begin the path at the root node of the tree that contains the
contextnode. If the context item is not a node, a type error is   raised [err:XPTY0020]. At evaluation time, if the
rootnode above   the context node is not a document node, a dynamic error is raised   [err:XPDY0050].

The problem is that we certainly do have to provide a context node (the 
standard is clear about that), and unless we want to convert a 
non-document to a node-set as James suggested and then apply the xpath 
expression to each node in the node-set, we have no way of sanely 
specifying the context node.

cheers

andrew

Re: xpath processing brain dead

From

Hannu Krosing

Date:

02 March 2009, 10:13:13

On Mon, 2009-03-02 at 15:25 +0200, Peter Eisentraut wrote:
> Hannu Krosing wrote:
> > Is it just that in you _can't_ use Xpath on fragments, and you _need_ to
> > pass full documents to Xpath ? 
> > 
> > At least this is my reading of Xpath standard.
> 
> It is easy to read the XPath standard that way, because the concept of 
> fragments is not defined outside of SQL/XML, and is therefore unknown to 
> the XPath standard. 

How is the opposite - Does SQL/XML specify Xpath usage for XML(SEQUENCE)
and XML(CONTENT) ?

>  The question at hand is rather whether we can 
> usefully adapt it.

This sounds like trying to adapt integer arithmetic to
lists-of-integers.

Even for simple things like addition, there are several ways of doing it

[1,2,3] + [1,1,1] = [1,2,3,1,1,1]
[1,2,3] + [1,1,1] = [2,3,4]
[1,2,3] + [1,1,1] = [[1,2,3],[1,1,1]]

all seem possible and "logical"


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability   Services, Consulting and Training

Re: xpath processing brain dead

From

Andrew Dunstan

Date:

20 March 2009, 12:57:10

Andrew Dunstan wrote:
>
>
> Hannu Krosing wrote:
>> Is it just that in you _can't_ use Xpath on fragments, and you _need_ to
>> pass full documents to Xpath ?
>> At least this is my reading of Xpath standard.
>>
>>   
>
> I think that's possibly overstating it., unless I have missed 
> something (W3 standards are sometimes not much more clear than the SQL 
> standards ;-( )
>
> For instance, there's this, that implies at least that the tree might 
> not be a document:
>
>    A "/" at the beginning of a path expression is an abbreviation for
>    the initial step fn:root(self::node()) treat as document-node()/
>    (however, if the "/" is the entire path expression, the trailing "/"
>    is omitted from the expansion.) The effect of this initial step is
>    to begin the path at the root node of the tree that contains the
>    context node. If the context item is not a node, a type error is
>    raised [err:XPTY0020]. At evaluation time, if the root node above
>    the context node is not a document node, a dynamic error is raised
>    [err:XPDY0050].
>
> The problem is that we certainly do have to provide a context node 
> (the standard is clear about that), and unless we want to convert a 
> non-document to a node-set as James suggested and then apply the xpath 
> expression to each node in the node-set, we have no way of sanely 
> specifying the context node.
>

No-one has come up with an answer to this, so I propose to remove the 
hackery. That leaves the question of what to do when the xml is not a 
well formed document ... raise an error?

cheers

andrew

Re: xpath processing brain dead

From

James Pye

Date:

20 March 2009, 20:03:00

On Mar 20, 2009, at 8:56 AM, Andrew Dunstan wrote:
> Andrew Dunstan wrote:
>>   A "/" at the beginning of a path expression is an abbreviation for
>>   the initial step fn:root(self::node()) treat as document-node()/
>>   (however, if the "/" is the entire path expression, the trailing  
>> "/"
>>   is omitted from the expansion.) The effect of this initial step is
>>   to begin the path at the root node of the tree that contains the
>>   context node. If the context item is not a node, a type error is
>>   raised [err:XPTY0020]. At evaluation time, if the root node above
>>   the context node is not a document node, a dynamic error is raised
>>   [err:XPDY0050].
>>
>> The problem is that we certainly do have to provide a context node  
>> (the standard is clear about that), and unless we want to convert a  
>> non-document to a node-set as James suggested and then apply the  
>> xpath expression to each node in the node-set, we have no way of  
>> sanely specifying the context node.

libxml2 only supports xpath1. Why are you referencing xpath20? And, if  
SQL-XML requires an xpath20 conforming xpath() function, we have  
bigger problems than "'/x' + xpath_str". ;)

Although, this is not terribly important to me as:

> No-one has come up with an answer to this,

I don't feel fragment()/node-set() is a good idea from a usability  
standpoint alone. I only mentioned it because that's how I've always  
worked with fragments in the xslt1 world.. Mere curiosity drove most  
of the remaining interest I had in it.

> so I propose to remove the hackery.

I think this would probably be best for the core xpath() function.

However, it may be wise to relocate the munging functionality into  
another function so users don't have invent their own when they feel  
so inclined to work with fragments.

> raise an error?

+1, "xpath requires a well-formed document"