Thread: Fix xpath() to return namespace definitions

Fix xpath() to return namespace definitions

From
Ali Akbar
Date:
Hi all,

While developing some XML processing queries, i stumbled on an old bug mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or repeated xpath() that apparently mess up namespaces.

Source of the bug is that libxml2's xmlNodeDump won't output XML namespace definitions that declared in the node's parents. As per https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858, the behavior is intentional.

This patch uses function xmlCopyNode that copies a node, including its namespace definitions as required (without unused namespace in the node or its children). When the copy dumped, the resulting XML is complete with its namespaces. Calling xmlCopyNode will need additional memory to execute, but reimplementing its routine to handle namespace definition will introduce much complexity to the code.

Note: This is my very first postgresql patch.

--
Ali Akbar
Attachment

Re: Fix xpath() to return namespace definitions

From
Robert Haas
Date:
On Fri, May 30, 2014 at 5:04 AM, Ali Akbar <the.apaan@gmail.com> wrote:
> While developing some XML processing queries, i stumbled on an old bug
> mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
> repeated xpath() that apparently mess up namespaces.
>
> Source of the bug is that libxml2's xmlNodeDump won't output XML namespace
> definitions that declared in the node's parents. As per
> https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858,
> the behavior is intentional.
>
> This patch uses function xmlCopyNode that copies a node, including its
> namespace definitions as required (without unused namespace in the node or
> its children). When the copy dumped, the resulting XML is complete with its
> namespaces. Calling xmlCopyNode will need additional memory to execute, but
> reimplementing its routine to handle namespace definition will introduce
> much complexity to the code.
>
> Note: This is my very first postgresql patch.

Please add your patch here so we don't forget about it:

https://commitfest.postgresql.org/action/commitfest_view/open

Please see also:

https://wiki.postgresql.org/wiki/Submitting_a_Patch

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Fix xpath() to return namespace definitions

From
Ali Akbar
Date:
Thanks, i've read it.
As for the suggestion there: "Start out by reviewing a patch or responding to email on the lists. Even if it is unrelated to what you're doing", i'll look around other posts in this list.

For back versions, i think because this patch changes xpath() behavior, we will only apply this to future versions. The old behavior is wrong (according to XPath standard) for not including namespaces, but maybe there are some application that depends on the old behavior.

Thanks!
--
Ali Akbar

[REVIEW] Re: Fix xpath() to return namespace definitions

From
Abhijit Menon-Sen
Date:
At 2014-05-30 16:04:33 +0700, the.apaan@gmail.com wrote:
>
> While developing some XML processing queries, i stumbled on an old bug
> mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
> repeated xpath() that apparently mess up namespaces.

Thanks for the patch, and welcome to Postgres development.

I can confirm that it works fine. I have attached here a very slightly
tweaked version of the patch (removed trailing whitespace, and changed
some comment text).

I'm marking this ready for committer.

-- Abhijit

Attachment

Re: [REVIEW] Re: Fix xpath() to return namespace definitions

From
Ali Akbar
Date:
2014-06-16 22:52 GMT+07:00 Abhijit Menon-Sen <ams@2ndquadrant.com>
Thanks for the patch, and welcome to Postgres development.
 
I can confirm that it works fine. I have attached here a very slightly
tweaked version of the patch (removed trailing whitespace, and changed
some comment text).

I'm marking this ready for committer.

Thanks for the review. Hope i will be able to contribute a little here and there in the future.

--
Ali Akbar

Re: [REVIEW] Re: Fix xpath() to return namespace definitions

From
Tom Lane
Date:
Abhijit Menon-Sen <ams@2ndQuadrant.com> writes:
> I can confirm that it works fine. I have attached here a very slightly
> tweaked version of the patch (removed trailing whitespace, and changed
> some comment text).

> I'm marking this ready for committer.

I don't know enough about XML/XPATH to know if this is a good idea or not,
but I did go read the documentation of xmlCopyNode(), and I notice it says
the second argument is

extended: if 1 do a recursive copy (properties, namespaces and children
when applicable) if 2 copy properties and namespaces (when applicable)

Do we really want it to "copy children"?  Perhaps the value should be 2?
(I have no idea, I'm just raising the question.)

I also notice that it says

Returns: a new #xmlNodePtr, or NULL in case of error.

and the patch is failing to cover the error case.  Most likely, that
would represent out-of-memory, so it definitely seems to be something
we should account for.
        regards, tom lane



Re: [REVIEW] Re: Fix xpath() to return namespace definitions

From
Ali Akbar
Date:

I don't know enough about XML/XPATH to know if this is a good idea or not,

Actually currently because of the namespace problem, xpath() returns wrong result (even worse: invalid xml!). So this patch corrects the behavior of xpath() to the correct one.
 
but I did go read the documentation of xmlCopyNode(), and I notice it says
the second argument is

extended: if 1 do a recursive copy (properties, namespaces and children
when applicable) if 2 copy properties and namespaces (when applicable)

Do we really want it to "copy children"?  Perhaps the value should be 2?
(I have no idea, I'm just raising the question.)

If we put 1 as the parameter, the resulting Node will link to the existing children. In this case, the namespace problem for the children might be still exists. If any of the children uses different namespace(s) than the parent, the namespace definition will not be copied correctly.

Just to be safe, in the patch 1 passed as the second argument.

Ideally, we can traverse the Node object and generate XML accordingly, but it is a lot of work, and a lot of duplicating libxml's code. Maybe we should push this upstream to libxml?

I also notice that it says

Returns: a new #xmlNodePtr, or NULL in case of error.

and the patch is failing to cover the error case.  Most likely, that
would represent out-of-memory, so it definitely seems to be something
we should account for.

Ah, overlooked that.

Skimming through libxml2's source, it looks like there are some other cases beside out-of-memory. Will update the patch according to these cases.

Thanks for the review.

--
Ali Akbar

Re: [REVIEW] Re: Fix xpath() to return namespace definitions

From
Ali Akbar
Date:
Greetings,

Attached modified patch to handle xmlCopyNode returning NULL. The patch is larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt is needed for calling xml_ereport).

From libxml2 source, it turns out that the other cases that xmlCopyNode will return NULL will not happen. So in this patch i assume that the only case is memory exhaustion.

But i found some bug in libxml2's code, because we call xmlCopyNode with 1 as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode whether it's NULL. So if someday the recursion returns NULL (maybe because of memory exhaustion), it will SEGFAULT.

Knowing this but in libxml2 code, is this patch is still acceptable?

Thanks,
Ali Akbar
Attachment

Re: [REVIEW] Re: Fix xpath() to return namespace definitions

From
Ali Akbar
Date:
Greetings,

Because of the memory bug in xmlCopyNode, this is a new patch with different method. In this patch, instead of using xmlCopyNode to bring the namespace back, we added the required namespaces to the  node before dumping the node to string, and cleaning it up afterwards (because the same node could be dumped again in next xpath result).

Thanks,
Ali Akbar


2014-07-11 15:36 GMT+07:00 Ali Akbar <the.apaan@gmail.com>:
Greetings,

Attached modified patch to handle xmlCopyNode returning NULL. The patch is larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt is needed for calling xml_ereport).

From libxml2 source, it turns out that the other cases that xmlCopyNode will return NULL will not happen. So in this patch i assume that the only case is memory exhaustion.

But i found some bug in libxml2's code, because we call xmlCopyNode with 1 as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode whether it's NULL. So if someday the recursion returns NULL (maybe because of memory exhaustion), it will SEGFAULT.

Knowing this but in libxml2 code, is this patch is still acceptable?

Thanks,
Ali Akbar



--
Ali Akbar
Attachment

Re: [REVIEW] Re: Fix xpath() to return namespace definitions

From
Peter Eisentraut
Date:
On 7/11/14 4:36 AM, Ali Akbar wrote:
> But i found some bug in libxml2's code, because we call xmlCopyNode with
> 1 as the second parameter, internally xmlCopyNode will call
> xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And
> xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode
> whether it's NULL. So if someday the recursion returns NULL (maybe
> because of memory exhaustion), it will SEGFAULT.
> 
> Knowing this but in libxml2 code, is this patch is still acceptable?

This problem was fixed in libxml2 2.9.2 (see
https://bugzilla.gnome.org/show_bug.cgi?id=708681).



Re: [REVIEW] Re: Fix xpath() to return namespace definitions

From
Peter Eisentraut
Date:
On 7/8/14 6:03 AM, Ali Akbar wrote:
> If we put 1 as the parameter, the resulting Node will link to the
> existing children. In this case, the namespace problem for the children
> might be still exists. If any of the children uses different
> namespace(s) than the parent, the namespace definition will not be
> copied correctly.

This analysis might very well be right, but I think we should prove it
with a test case.