Thread: remove contrib/xml2

remove contrib/xml2

From
Robert Haas
Date:
There has been some more discussion lately of problems caused by contrib/xml2.

http://archives.postgresql.org/pgsql-bugs/2010-01/msg00251.php
http://archives.postgresql.org/pgsql-bugs/2010-01/msg00198.php

I think we need to either (1) fix the bugs and update the
documentation to remove the statement that this will be removed or (2)
actually remove it.  Nobody seems interested in #1, so PFA a patch to
do #2.  It also rips out all the libxslt stuff, which seems to exist
only for the purpose of supporting contrib/xml2.  Per discussion on IM
with Magnus, it appears that libxslt is actually not needed to build
the backend on Windows, despite a comment in the Windows build stuff
to the contrary.

Thoughts?

...Robert

Attachment

Re: remove contrib/xml2

From
Andrew Dunstan
Date:


Robert Haas wrote:
> There has been some more discussion lately of problems caused by contrib/xml2.
>
> http://archives.postgresql.org/pgsql-bugs/2010-01/msg00251.php
> http://archives.postgresql.org/pgsql-bugs/2010-01/msg00198.php
>
> I think we need to either (1) fix the bugs and update the
> documentation to remove the statement that this will be removed or (2)
> actually remove it.  Nobody seems interested in #1, so PFA a patch to
> do #2.  It also rips out all the libxslt stuff, which seems to exist
> only for the purpose of supporting contrib/xml2.  
>   

The problem is that there are people who use the XSLT and xpath_table 
stuff on text data and so don't run into these bugs.

I agree it's a mess but I don't think just abandoning the functionality 
is a good idea.

cheers

andrew






Re: remove contrib/xml2

From
Mike Rylander
Date:
On Thu, Jan 28, 2010 at 4:18 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
> Robert Haas wrote:
>>
>> There has been some more discussion lately of problems caused by
>> contrib/xml2.
>>
>> http://archives.postgresql.org/pgsql-bugs/2010-01/msg00251.php
>> http://archives.postgresql.org/pgsql-bugs/2010-01/msg00198.php
>>
>> I think we need to either (1) fix the bugs and update the
>> documentation to remove the statement that this will be removed or (2)
>> actually remove it.  Nobody seems interested in #1, so PFA a patch to
>> do #2.  It also rips out all the libxslt stuff, which seems to exist
>> only for the purpose of supporting contrib/xml2.
>
> The problem is that there are people who use the XSLT and xpath_table stuff
> on text data and so don't run into these bugs.
>
> I agree it's a mess but I don't think just abandoning the functionality is a
> good idea.

I'm one of those people.  :)

Expecting to see contrib/xml2 go away at some point, possibly without
replacements for xslt_process and xpath_table, I've been working on
some plpgsql and plperlu work-alikes targeted at TEXT columns, as the
xml2 versions do. I hope these (attached) will be of some help to
others.  Note, these are not the exact functions I use, they are
lightly edited to remove the use of wrappers I've created to paper
over the transition from xpath_nodeset() to core XPATH().

--
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  miker@esilibrary.com
 | web:  http://www.esilibrary.com

Attachment

Re: remove contrib/xml2

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Robert Haas wrote:
>> I think we need to either (1) fix the bugs and update the
>> documentation to remove the statement that this will be removed or (2)
>> actually remove it.

> I agree it's a mess but I don't think just abandoning the functionality 
> is a good idea.

Yeah, we can't really remove it until we have a plausible substitute for
the xpath_table functionality.  This is in the TODO list ...
        regards, tom lane


Re: remove contrib/xml2

From
Josh Berkus
Date:
> Yeah, we can't really remove it until we have a plausible substitute for
> the xpath_table functionality.  This is in the TODO list ...

What about moving it to pgfoundry?

I'm really not keen on shipping known-broken stuff in /contrib.

--Josh Berkus


Re: remove contrib/xml2

From
Robert Haas
Date:
On Thu, Jan 28, 2010 at 5:54 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Yeah, we can't really remove it until we have a plausible substitute for
>> the xpath_table functionality.  This is in the TODO list ...
>
> What about moving it to pgfoundry?
>
> I'm really not keen on shipping known-broken stuff in /contrib.

Yeah, exactly.  Another option - if we know that it works OK with
inputs of certain types - might be to throw an error if we get an
input of a type we know it doesn't work with.  But I guess I haven't
followed this closely enough to know exactly what kinds of arguments
do/don't work.

...Robert


Re: remove contrib/xml2

From
Alvaro Herrera
Date:
Robert Haas escribió:
> On Thu, Jan 28, 2010 at 5:54 PM, Josh Berkus <josh@agliodbs.com> wrote:
> >> Yeah, we can't really remove it until we have a plausible substitute for
> >> the xpath_table functionality.  This is in the TODO list ...
> >
> > What about moving it to pgfoundry?
> >
> > I'm really not keen on shipping known-broken stuff in /contrib.
> 
> Yeah, exactly.  Another option - if we know that it works OK with
> inputs of certain types - might be to throw an error if we get an
> input of a type we know it doesn't work with.  But I guess I haven't
> followed this closely enough to know exactly what kinds of arguments
> do/don't work.

If it were easy to throw an error, it'd be better to avoid the crash.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: remove contrib/xml2

From
Robert Haas
Date:
On Thu, Jan 28, 2010 at 5:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> Robert Haas wrote:
>>> I think we need to either (1) fix the bugs and update the
>>> documentation to remove the statement that this will be removed or (2)
>>> actually remove it.
>
>> I agree it's a mess but I don't think just abandoning the functionality
>> is a good idea.
>
> Yeah, we can't really remove it until we have a plausible substitute for
> the xpath_table functionality.  This is in the TODO list ...

My feeling is that if it's as flakey and unreliable as it currently
is, we shouldn't ship it.  Removing it from CVS doesn't mean "you
can't use this any more"; this is open source.  It just means people
will have to go and get an old copy out of CVS and presumably in so
doing they will be aware that we've removed it for a reason.  We have
a well-deserved reputation for quality and I would like to see us
preserve that.

Is anyone going to try to fix this for 9.0?

...Robert


Re: remove contrib/xml2

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> My feeling is that if it's as flakey and unreliable as it currently
> is, we shouldn't ship it.  Removing it from CVS doesn't mean "you
> can't use this any more"; this is open source.  It just means people
> will have to go and get an old copy out of CVS and presumably in so
> doing they will be aware that we've removed it for a reason.  We have
> a well-deserved reputation for quality and I would like to see us
> preserve that.

[ shrug... ]  It is not any more flaky than it's been since it was put in.
The people who have been depending on it presumably have use-patterns
for which it doesn't fail, and we're not going to be doing them a
service by ripping out functionality for which we can't offer a
replacement.

As for the "quality" argument, contrib modules are not guaranteed
to be of the same standard as the core code; anyone who thinks they are
should disabuse themselves of the notion by reading some of that code.
        regards, tom lane


Re: remove contrib/xml2

From
Robert Haas
Date:
On Mon, Feb 1, 2010 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> My feeling is that if it's as flakey and unreliable as it currently
>> is, we shouldn't ship it.  Removing it from CVS doesn't mean "you
>> can't use this any more"; this is open source.  It just means people
>> will have to go and get an old copy out of CVS and presumably in so
>> doing they will be aware that we've removed it for a reason.  We have
>> a well-deserved reputation for quality and I would like to see us
>> preserve that.
>
> [ shrug... ]  It is not any more flaky than it's been since it was put in.
> The people who have been depending on it presumably have use-patterns
> for which it doesn't fail, and we're not going to be doing them a
> service by ripping out functionality for which we can't offer a
> replacement.

Well, then we'd at least better update the documentation to (1) remove
the statement that this will be removed in 8.4 (since we didn't), and
(2) add a very, very large warning that this will crash if you do
almost anything with it.

...Robert


Re: remove contrib/xml2

From
Andrew Dunstan
Date:

Robert Haas wrote:
> (2) add a very, very large warning that this will crash if you do
> almost anything with it.
>
>
>   

I think that's an exaggeration. Certain people are known to be using it 
quite successfully.

cheers

andrew


Re: remove contrib/xml2

From
Robert Haas
Date:
On Mon, Feb 1, 2010 at 5:23 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> Robert Haas wrote:
>> (2) add a very, very large warning that this will crash if you do
>> almost anything with it.
>
> I think that's an exaggeration. Certain people are known to be using it
> quite successfully.

Hmm.  Well, all I know is that the first thing I tried crashed the server.

CREATE TABLE xpath_test (id integer NOT NULL, t xml);
INSERT INTO xpath_test VALUES (1, '<doc><int>1</int></doc>');
SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
as t(id int4);

It doesn't crash if you change the type of t from xml to text; instead
you get a warning about some sort of memory allocation problem.

DROP TABLE xpath_test;
CREATE TABLE xpath_test (id integer NOT NULL, t text);
INSERT INTO xpath_test VALUES (1, '<doc><int>1</int></doc>');
SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
as t(id int4);

yields:

WARNING:  problem in alloc set ExprContext: bogus aset link in block
0x14645e0, chunk 0x14648b8

And then there's this (see also bug #5285):

DELETE FROM xpath_test;
INSERT INTO xpath_test VALUES (1, '<rowlist><row a="1"/><row a="2"
b="oops"/></rowlist>');
SELECT * FROM xpath_table('id', 't', 'xpath_test',
'/rowlist/row/@a|/rowlist/row/@b', 'true') as t(id int4, a text, b
text);

which yields an answer that is, at least, extremely surprising, if not
flat-out wrong:
id | a |  b
----+---+------ 1 | 1 | oops 1 | 2 |
(2 rows)

Bugs #4953 and #5079 can also be reproduced in CVS HEAD.  Both crash the server.

...Robert


Re: remove contrib/xml2

From
Alvaro Herrera
Date:
Robert Haas escribió:
> On Mon, Feb 1, 2010 at 5:23 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> > Robert Haas wrote:
> >> (2) add a very, very large warning that this will crash if you do
> >> almost anything with it.
> >
> > I think that's an exaggeration. Certain people are known to be using it
> > quite successfully.
> 
> Hmm.  Well, all I know is that the first thing I tried crashed the server.
> 
> CREATE TABLE xpath_test (id integer NOT NULL, t xml);
> INSERT INTO xpath_test VALUES (1, '<doc><int>1</int></doc>');
> SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
> as t(id int4);

This trivial patch lingering on my system fixes this crasher (this is
for the 8.3 branch).  It makes the "problem in alloc set ExprContext"
warning show up instead.

There are still lotsa other holes, but hey, this is a start ...

Index: contrib/xml2/xpath.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/contrib/xml2/xpath.c,v
retrieving revision 1.16.2.1
diff -c -p -r1.16.2.1 xpath.c
*** contrib/xml2/xpath.c    26 Mar 2008 01:19:11 -0000    1.16.2.1
--- contrib/xml2/xpath.c    27 Jan 2010 15:30:56 -0000
*************** xpath_table(PG_FUNCTION_ARGS)
*** 793,798 ****
--- 793,801 ----  */     pgxml_parser_init(); 
+     PG_TRY();
+     {
+      /* For each row i.e. document returned from SPI */     for (i = 0; i < proc; i++)     {
*************** xpath_table(PG_FUNCTION_ARGS)
*** 929,934 ****
--- 932,944 ----         if (xmldoc)             pfree(xmldoc);     }
+     }
+     PG_CATCH();
+     {
+         xmlCleanupParser();
+         PG_RE_THROW();
+     }
+     PG_END_TRY();      xmlCleanupParser(); /* Needed to flag completeness in 7.3.1. 7.4 defines it as a no-op. */

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: remove contrib/xml2

From
M Z
Date:
I did some tests followed Robert's test cases on both postgresql 8.4.2-0ubu and 8.3.8-1, OS: Ubuntu Karmic.

1) 1st test case, it doesn't crash on 8.3.8 but crash on 8.4.2;
2) 2nd test case, both 8.3.8 and 8.4.2 are fine, and no warning (different from Robert's test?);
3) 3rd test case (and modified test case for 8.3.8), both 8.3.8 and 8.4.2 are not correct, same with Robert's test (8.5 beta?);


*************************************
1st test case:
==================
8.3.8
==================
conifer=# CREATE TABLE xpath_test (id integer NOT NULL, t xml);
CREATE TABLE
conifer=# INSERT INTO xpath_test VALUES (1, '<doc><int>1</int></doc>');
INSERT 0 1
conifer=# SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true') as t(id int4);
 id
----
  1
(1 row)

==================
8.4.2
==================
conifer=# CREATE TABLE xpath_test (id integer NOT NULL, t xml);
CREATE TABLE
conifer=# INSERT INTO xpath_test VALUES (1, '<doc><int>1</int></doc>');
INSERT 0 1
conifer=# SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true') as t(id int4);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
*************************************

*************************************
2nd test case
==================
8.3.8 and 8.4.2
==================
conifer=# CREATE TABLE xpath_test (id integer NOT NULL, t text);
CREATE TABLE
conifer=# INSERT INTO xpath_test VALUES (1, '<doc><int>1</int></doc>');
INSERT 0 1
conifer=# SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true') as t(id int4);
 id
----
  1
(1 row)
*************************************

*************************************
3rd test case
==================
8.3.8 and 8.4.2
==================
conifer=# CREATE TABLE xpath_test (id integer NOT NULL, t text);
CREATE TABLE
conifer=# INSERT INTO xpath_test VALUES (1, '<rowlist><row a="1"/><row a="2" b="oops"/></rowlist>');
INSERT 0 1
conifer=# SELECT * FROM xpath_table('id', 't', 'xpath_test', '/rowlist/row/@a|/rowlist/row/@b', 'true') as t(id int4, a text, b text);
 id | a |  b  
----+---+------
  1 | 1 | oops
  1 | 2 |
(2 rows)


==================
8.3.8 (modified 3rd test case, because 8.3.8 won't crash using xml)
==================
conifer=# CREATE TABLE xpath_test (id integer NOT NULL, t xml);
CREATE TABLE
conifer=# INSERT INTO xpath_test VALUES (1, '<rowlist><row a="1"/><row a="2" b="oops"/></rowlist>');
INSERT 0 1
conifer=# SELECT * FROM xpath_table('id', 't', 'xpath_test', '/rowlist/row/@a|/rowlist/row/@b', 'true') as t(id int4, a text, b text);
 id | a |  b  
----+---+------
  1 | 1 | oops
  1 | 2 |
(2 rows)
*************************************


For 1st test case, not sure if some paths applied to 8.3 haven't been applied to 8.4, or other reasons cause the difference between 8.3.8 and 8.4.2.

Any ideas or comments?

Thank you,
M Z

On Mon, Feb 1, 2010 at 8:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Feb 1, 2010 at 5:23 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> Robert Haas wrote:
>> (2) add a very, very large warning that this will crash if you do
>> almost anything with it.
>
> I think that's an exaggeration. Certain people are known to be using it
> quite successfully.

Hmm.  Well, all I know is that the first thing I tried crashed the server.

CREATE TABLE xpath_test (id integer NOT NULL, t xml);
INSERT INTO xpath_test VALUES (1, '<doc><int>1</int></doc>');
SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
as t(id int4);

It doesn't crash if you change the type of t from xml to text; instead
you get a warning about some sort of memory allocation problem.

DROP TABLE xpath_test;
CREATE TABLE xpath_test (id integer NOT NULL, t text);
INSERT INTO xpath_test VALUES (1, '<doc><int>1</int></doc>');
SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
as t(id int4);

yields:

WARNING:  problem in alloc set ExprContext: bogus aset link in block
0x14645e0, chunk 0x14648b8

And then there's this (see also bug #5285):

DELETE FROM xpath_test;
INSERT INTO xpath_test VALUES (1, '<rowlist><row a="1"/><row a="2"
b="oops"/></rowlist>');
SELECT * FROM xpath_table('id', 't', 'xpath_test',
'/rowlist/row/@a|/rowlist/row/@b', 'true') as t(id int4, a text, b
text);

which yields an answer that is, at least, extremely surprising, if not
flat-out wrong:

 id | a |  b
----+---+------
 1 | 1 | oops
 1 | 2 |
(2 rows)

Bugs #4953 and #5079 can also be reproduced in CVS HEAD.  Both crash the server.

...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: remove contrib/xml2

From
Robert Haas
Date:
On Thu, Feb 4, 2010 at 10:51 PM, M Z <jm80008@gmail.com> wrote:
> I did some tests followed Robert's test cases on both postgresql 8.4.2-0ubu
> and 8.3.8-1, OS: Ubuntu Karmic.
>
> 1) 1st test case, it doesn't crash on 8.3.8 but crash on 8.4.2;

Interesting.  So, that's a regression of some kind.

> 2) 2nd test case, both 8.3.8 and 8.4.2 are fine, and no warning (different
> from Robert's test?);

I built with --enable-debug and --enable-cassert, which might be relevant.

> 3) 3rd test case (and modified test case for 8.3.8), both 8.3.8 and 8.4.2
> are not correct, same with Robert's test (8.5 beta?);

As I think about that further, it might not be a bug - how is the
processor supposed to know what we expect to happen?  But then, I
don't really know how this is supposed to work.

...Robert


Re: remove contrib/xml2

From
Robert Haas
Date:
On Wed, Feb 3, 2010 at 8:49 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Robert Haas escribió:
>> On Mon, Feb 1, 2010 at 5:23 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> > Robert Haas wrote:
>> >> (2) add a very, very large warning that this will crash if you do
>> >> almost anything with it.
>> >
>> > I think that's an exaggeration. Certain people are known to be using it
>> > quite successfully.
>>
>> Hmm.  Well, all I know is that the first thing I tried crashed the server.
>>
>> CREATE TABLE xpath_test (id integer NOT NULL, t xml);
>> INSERT INTO xpath_test VALUES (1, '<doc><int>1</int></doc>');
>> SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
>> as t(id int4);
>
> This trivial patch lingering on my system fixes this crasher (this is
> for the 8.3 branch).  It makes the "problem in alloc set ExprContext"
> warning show up instead.
>
> There are still lotsa other holes, but hey, this is a start ...

Interestingly M Z found he couldn't reproduce this crash on 8.3.  Can
you?  If so, +1 for applying this and backpatching it as far as make
sense.

...Robert


Re: remove contrib/xml2

From
M Z
Date:
The thing is, why it doesn't crash on 8.3.8 but crash on 8.4.2? Any idea? A patch was applied to 8.3 but not to 8.4.2?

Thanks,
M Z

On Fri, Feb 5, 2010 at 1:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 3, 2010 at 8:49 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Robert Haas escribió:
>> On Mon, Feb 1, 2010 at 5:23 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> > Robert Haas wrote:
>> >> (2) add a very, very large warning that this will crash if you do
>> >> almost anything with it.
>> >
>> > I think that's an exaggeration. Certain people are known to be using it
>> > quite successfully.
>>
>> Hmm.  Well, all I know is that the first thing I tried crashed the server.
>>
>> CREATE TABLE xpath_test (id integer NOT NULL, t xml);
>> INSERT INTO xpath_test VALUES (1, '<doc><int>1</int></doc>');
>> SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
>> as t(id int4);
>
> This trivial patch lingering on my system fixes this crasher (this is
> for the 8.3 branch).  It makes the "problem in alloc set ExprContext"
> warning show up instead.
>
> There are still lotsa other holes, but hey, this is a start ...

Interestingly M Z found he couldn't reproduce this crash on 8.3.  Can
you?  If so, +1 for applying this and backpatching it as far as make
sense.

...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: remove contrib/xml2

From
Tom Lane
Date:
M Z <jm80008@gmail.com> writes:
> The thing is, why it doesn't crash on 8.3.8 but crash on 8.4.2? Any idea?

Pure luck.  Memory-clobber bugs like these are notoriously
nondeterministic.  Any minor, logically unrelated change could make them
visible or not visible, because the clobber happens to clobber data that
is or is not in active use at the moment.
        regards, tom lane


Re: remove contrib/xml2

From
M Z
Date:
Hi Alvaro,

I followed your instruction but put the patch on 8.4.2 as I found it crashes. It looks like the server still crash in the same way. Can you and anyone give me some ideas how to fix this bug?

==============================
conifer=# CREATE TABLE xpath_test (id integer NOT NULL, t xml);
CREATE TABLE
conifer=# INSERT INTO xpath_test VALUES (1, '<doc><int>1</int></doc>');
INSERT 0 1
conifer=# SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true') as t(id int4);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
==============================

Best,
M Z


>
> CREATE TABLE xpath_test (id integer NOT NULL, t xml);
> INSERT INTO xpath_test VALUES (1, '<doc><int>1</int></doc>');
> SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
> as t(id int4);

> Hmm.  Well, all I know is that the first thing I tried crashed the server.
 
This trivial patch lingering on my system fixes this crasher (this is
for the 8.3 branch).  It makes the "problem in alloc set ExprContext"
warning show up instead.

There are still lotsa other holes, but hey, this is a start ...

Index: contrib/xml2/xpath.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/contrib/xml2/xpath.c,v
retrieving revision 1.16.2.1
diff -c -p -r1.16.2.1 xpath.c
*** contrib/xml2/xpath.c        26 Mar 2008 01:19:11 -0000      1.16.2.1
--- contrib/xml2/xpath.c        27 Jan 2010 15:30:56 -0000
*************** xpath_table(PG_FUNCTION_ARGS)
*** 793,798 ****
--- 793,801 ----
  */
       pgxml_parser_init();

+       PG_TRY();
+       {
+
       /* For each row i.e. document returned from SPI */
       for (i = 0; i < proc; i++)
       {
*************** xpath_table(PG_FUNCTION_ARGS)
*** 929,934 ****
--- 932,944 ----
               if (xmldoc)
                       pfree(xmldoc);
       }
+       }
+       PG_CATCH();
+       {
+               xmlCleanupParser();
+               PG_RE_THROW();
+       }
+       PG_END_TRY();

       xmlCleanupParser();
 /* Needed to flag completeness in 7.3.1. 7.4 defines it as a no-op. */

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: remove contrib/xml2

From
Alvaro Herrera
Date:
M Z escribió:
> Hi Alvaro,
> 
> I followed your instruction but put the patch on 8.4.2 as I found it
> crashes. It looks like the server still crash in the same way. Can you and
> anyone give me some ideas how to fix this bug?

See src/backend/utils/adt/xml.c.  Note the comment at the top:

/** Notes on memory management:** Sometimes libxml allocates global structures in the hope that it can reuse* them
lateron.  This makes it impractical to change the xmlMemSetup* functions on-the-fly; that is likely to lead to trying
topfree() chunks* allocated with malloc() or vice versa.  Since libxml might be used by* loadable modules, eg libperl,
ouronly safe choices are to change the* functions at postmaster/backend launch or not at all.  Since we'd rather* not
activatelibxml in sessions that might never use it, the latter choice* is the preferred one.  However, for debugging
purposesit can be awfully* handy to constrain libxml's allocations to be done in a specific palloc* context, where
they'reeasy to track.  Therefore there is code here that* can be enabled in debug builds to redirect libxml's
allocationsinto a* special context LibxmlContext.  It's not recommended to turn this on in* a production build because
ofthe possibility of bad interactions with* external modules.*/
 
/* #define USE_LIBXMLCONTEXT */



Then if you look at xpath.c in contrib/xml2 you notice that it's doing
exactly the thing that the core module says it's unreliable: using
palloc and friends in xmlMemSetup.  So to fix the bug what's needed is
that the xmlMemSetup call in contrib is removed altogether, and all
memory is tracked and released by hand.  It's rather tedious, and it's
also difficult to plug all resulting memory leaks.  But AFAIUI doing
that would fix (some of?) the crashes.  Not sure if your crash is in
this category.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: remove contrib/xml2

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Then if you look at xpath.c in contrib/xml2 you notice that it's doing
> exactly the thing that the core module says it's unreliable: using
> palloc and friends in xmlMemSetup.  So to fix the bug what's needed is
> that the xmlMemSetup call in contrib is removed altogether, and all
> memory is tracked and released by hand.  It's rather tedious, and it's
> also difficult to plug all resulting memory leaks.  But AFAIUI doing
> that would fix (some of?) the crashes.  Not sure if your crash is in
> this category.

FWIW, the core xml code seems to have been pretty stable since we gave
up on trying to redirect libxml's memory allocations to palloc.
So what you basically need to do to xpath.c is something like this:
http://archives.postgresql.org/pgsql-committers/2009-05/msg00229.php
        regards, tom lane


Re: remove contrib/xml2

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> FWIW, the core xml code seems to have been pretty stable since we gave
> up on trying to redirect libxml's memory allocations to palloc.
> So what you basically need to do to xpath.c is something like this:
> http://archives.postgresql.org/pgsql-committers/2009-05/msg00229.php

Which translates to this git URL, for lazy browsers such as me:
 http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=f8059d7f8ae8bfac840fbda3c8efbc0f7c09b123
-- 
dim


Re: remove contrib/xml2

From
Tom Lane
Date:
I believe I have fixed all the reported crashes in contrib/xml2.
However there is still this issue pointed out by Robert:

> CREATE TABLE xpath_test (id integer NOT NULL, t xml);
> INSERT INTO xpath_test VALUES (1, '<rowlist><row a="1"/><row a="2" b="oops"/></rowlist>');
> SELECT * FROM xpath_table('id', 't', 'xpath_test',
> '/rowlist/row/@a|/rowlist/row/@b', 'true') as t(id int4, a text, b text);

> which yields an answer that is, at least, extremely surprising, if not
> flat-out wrong:

>  id | a |  b
> ----+---+------
>   1 | 1 | oops
>   1 | 2 |
> (2 rows)

the point being that it seems like "oops" should be associated with "2"
not "1".  The reason for that behavior is that xpath_table runs through
the XPATH_NODESET results generated by the various XPaths and dumps the
k'th one of each into the k'th output row generated for the current
input row.  If there is any way to synchronize which node in each array
goes with each node in each other array, it's not apparent to me, but
I don't know libxml's API at all.  Perhaps there is some other call we
should be using to evaluate all the XPaths in parallel?

(The code is also unbelievably inefficient, recompiling each XPath
expression once per output row (!); but it doesn't seem worth fixing
that right away given that we might have to throw away the logic
entirely in order to fix this bug.)
        regards, tom lane


Re: remove contrib/xml2

From
Andrew Dunstan
Date:

Tom Lane wrote:
> I believe I have fixed all the reported crashes in contrib/xml2.
>   

Yay! Well done! That at least removes any possibly urgency about 
removing the module.

> However there is still this issue pointed out by Robert:
>
>   
>> CREATE TABLE xpath_test (id integer NOT NULL, t xml);
>> INSERT INTO xpath_test VALUES (1, '<rowlist><row a="1"/><row a="2" b="oops"/></rowlist>');
>> SELECT * FROM xpath_table('id', 't', 'xpath_test',
>> '/rowlist/row/@a|/rowlist/row/@b', 'true') as t(id int4, a text, b text);
>>     
>
>   
>> which yields an answer that is, at least, extremely surprising, if not
>> flat-out wrong:
>>     
>
>   
>>  id | a |  b
>> ----+---+------
>>   1 | 1 | oops
>>   1 | 2 |
>> (2 rows)
>>     
>
> the point being that it seems like "oops" should be associated with "2"
> not "1".  The reason for that behavior is that xpath_table runs through
> the XPATH_NODESET results generated by the various XPaths and dumps the
> k'th one of each into the k'th output row generated for the current
> input row.  If there is any way to synchronize which node in each array
> goes with each node in each other array, it's not apparent to me, but
> I don't know libxml's API at all.  Perhaps there is some other call we
> should be using to evaluate all the XPaths in parallel?
>
> (The code is also unbelievably inefficient, recompiling each XPath
> expression once per output row (!); but it doesn't seem worth fixing
> that right away given that we might have to throw away the logic
> entirely in order to fix this bug.)
>
>             
>   

Damn that's ugly.


ISTM the missing piece is really in our API. We need to be able to 
specify a nodeset to iterate over, and then for each node take the first 
value produced by each xpath expression. So the example above would look 
something like:
   SELECT * FROM xpath_table('id', 't', 'xpath_test',   '/rowlist/row', '@a|@b', 'true') as t(id int4, a text, b
text);


Maybe we could approximate that with the current API by factoring out 
the common root of the xpath expressions, but that's likely to be 
extremely fragile and error prone, and we've already got bad experience 
of trying to be too cute with xpath expressions.

cheers

andrew


Re: remove contrib/xml2

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> ... The reason for that behavior is that xpath_table runs through
>> the XPATH_NODESET results generated by the various XPaths and dumps the
>> k'th one of each into the k'th output row generated for the current
>> input row.

> Damn that's ugly.

Yup :-(

> ISTM the missing piece is really in our API. We need to be able to 
> specify a nodeset to iterate over, and then for each node take the first 
> value produced by each xpath expression. So the example above would look 
> something like:

>     SELECT * FROM xpath_table('id', 't', 'xpath_test',
>     '/rowlist/row', '@a|@b', 'true') as t(id int4, a text, b text);

Hm.  It seems like that still leaves you open to the possibility of
out-of-sync results.  If you consider the current behavior as what
you'd get with an empty root nodeset spec, then restricting it to
produce only the first output row doesn't help at all -- it would still
associate "1" with "oops".  In general if the nodeset spec doesn't
select a unique subnode then you're at risk of bogus answers.
Maybe that could be defined as user error but it sure seems like it
would be error-prone to use.

> Maybe we could approximate that with the current API by factoring out 
> the common root of the xpath expressions, but that's likely to be 
> extremely fragile and error prone, and we've already got bad experience 
> of trying to be too cute with xpath expressions.

Agreed, we do not want to be doing textual manipulations of XPaths,
which is what burnt us before.  But does libxml2 offer any more
abstract path representation we could work on?
        regards, tom lane


Re: remove contrib/xml2

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Tom Lane wrote:
>>     
>>> ... The reason for that behavior is that xpath_table runs through
>>> the XPATH_NODESET results generated by the various XPaths and dumps the
>>> k'th one of each into the k'th output row generated for the current
>>> input row.
>>>       
>
>   
>> ISTM the missing piece is really in our API. We need to be able to 
>> specify a nodeset to iterate over, and then for each node take the first 
>> value produced by each xpath expression. So the example above would look 
>> something like:
>>     
>
>   
>>     SELECT * FROM xpath_table('id', 't', 'xpath_test',
>>     '/rowlist/row', '@a|@b', 'true') as t(id int4, a text, b text);
>>     
>
> Hm.  It seems like that still leaves you open to the possibility of
> out-of-sync results.  If you consider the current behavior as what
> you'd get with an empty root nodeset spec, then restricting it to
> produce only the first output row doesn't help at all -- it would still
> associate "1" with "oops".  In general if the nodeset spec doesn't
> select a unique subnode then you're at risk of bogus answers.
> Maybe that could be defined as user error but it sure seems like it
> would be error-prone to use.
>
>   

Well, I think that's going to be hard or impossible to avoid in the 
general case. My suggestion was intended to give the user a much better 
chance of avoiding it, however.

Arbitrary XML (or JSON or YAML or any artbitrarilly tree structured data 
markup) doesn't map well to a rectangular structure, and this is always 
likely to cause problems like this IMO.

I guess in the end the user could preprocess the XML with XSLT to remove 
the irregularities before passing to to xpath_table.

We certainly need to put  some warnings in the docs about it.

cheers

andrew