Re: [REVIEW] Re: Fix xpath() to return namespace definitions - Mailing list pgsql-hackers

From Ali Akbar
Subject Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date
Msg-id CACQjQLri_Dcs7P8+wm_o1uxsxKvGQxJQMZ8OeVhhkoxthQFjhA@mail.gmail.com
Whole thread Raw
In response to Re: [REVIEW] Re: Fix xpath() to return namespace definitions  (Ali Akbar <the.apaan@gmail.com>)
Responses Re: [REVIEW] Re: Fix xpath() to return namespace definitions  (Ali Akbar <the.apaan@gmail.com>)
Re: [REVIEW] Re: Fix xpath() to return namespace definitions  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
2014-11-04 22:16 GMT+07:00 Peter Eisentraut <peter_e@gmx.net>:

On 10/6/14 10:24 PM, Ali Akbar wrote:
> While reviewing the patch myself, i spotted some formatting problems in
> the code. Fixed in this v5 patch.
>
> Also, this patch uses context patch format (in first versions, because
> of the small modification, context patch format obfucates the changes.
> After reimplementation this isn't the case anymore)

I think the problem this patch is addressing is real, and your approach
is sound, but I'd ask you to go back to the xmlCopyNode() version, and
try to add a test case for why the second argument = 1 is necessary.  I
don't see any other problems.
 
OK. Because upstream code is fixed in current version, i'll revert to the previous version. Test case added to regression test. With =1 argument, the result is correct:
<local:piece xmlns:local=\"http://127.0.0.1\" xmlns=\"http://127.0.0.2\" id=\"1\">
   <internal>number one</internal>
   <internal2/>
 </local:piece>

without the argument, the result is not correct, all children will be lost. Because of that, the other regression test will fail too because the children is not copied:
*** 584,593 ****
 
  -- Text XPath expressions evaluation
  SELECT xpath('/value', data) FROM xmltest;
!         xpath        
! ----------------------
!  {<value>one</value>}
!  {<value>two</value>}
  (2 rows)
 
  SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
--- 584,593 ----
 
  -- Text XPath expressions evaluation
  SELECT xpath('/value', data) FROM xmltest;
!    xpath   
! ------------
!  {<value/>}
!  {<value/>}
  (2 rows)
 
  SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
***************
... <cut>

updated patch attached.

I noticed somewhat big performance regression if the xml is large (i use PRODML Product Volume sample from energistics.org):
* Without patch (tested 3 times):
select unnest(xpath('//a:flow', x, ARRAY[['a','http://www.prodml.org/schemas/1series']])) from u;
                                            unnest                                            
-----------------------------------------------------------------------------------------------
 <flow>                                                                                       +
                                 <kind>gas lift</kind>                                        +
...
Time: 84,012 ms
Time: 85,683 ms
Time: 88,766 ms


* With latest v6 patch (notice the correct result with namespace definition):

select unnest(xpath('//a:flow', x, ARRAY[['a','http://www.prodml.org/schemas/1series']])) from u;
                                            unnest                                            
-----------------------------------------------------------------------------------------------
 <flow xmlns="http://www.prodml.org/schemas/1series">                                         +
...
Time: 108,912 ms
Time: 108,267 ms
Time: 114,848 ms


It's 23% performance regression.

* Just curious, i'm also testing v5 patch performance (notice the namespace in the result):
select unnest(xpath('//a:flow', x, ARRAY[['a','http://www.prodml.org/schemas/1series']])) from u;
                                            unnest                                            
-----------------------------------------------------------------------------------------------
 <flow xmlns="http://www.prodml.org/schemas/1series">                                         +
                                 <kind>gas lift</kind>                                        +
Time: 92,552 ms
Time: 97,440 ms
Time: 99,309 ms

The regression is only 13%. I know the xmlCopyNode() version (v6 patch) is much more cleaner than v5patch, should we consider the performance benefit?

Anyway, thanks for the review! :)

Regards,
--
Ali Akbar

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: get_cast_func syscache utility function
Next
From: Tom Lane
Date:
Subject: Time to remove dummy autocommit GUC?