Thread: libxml incompatibility

libxml incompatibility

From
Alvaro Herrera
Date:
Hi,

It seems that if you load libxml into a backend for whatever reason (say
you create a table with a column of type xml) and then create a plperlu
function that "use XML::LibXML", we get a segmentation fault.

This sequence reproduces the problem for me in 8.3:

create table xmlcrash (a xml);
insert into xmlcrash values ('<a />');
create function xmlcrash() returns void language plperlu as $$ use XML::LibXML; $$;

The problem is reported as

TRAP: BadArgument(«!(((context) != ((void *)0) && (((((Node*)((context)))->type) == T_AllocSetContext))))», Archivo:
«/pgsql/source/83_rel/src/backend/utils/mmgr/mcxt.c»,Línea: 507)
 


-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: libxml incompatibility

From
Kenneth Marshall
Date:
This looks like a problem caused by two different libxml versions:
the one used for the perl XML::LibXML wrappers and the one used to
build PostgreSQL. They really need to be the same. Does it still
segfault if they are identical?

Regards,
Ken

On Fri, Mar 06, 2009 at 04:14:04PM -0300, Alvaro Herrera wrote:
> Hi,
> 
> It seems that if you load libxml into a backend for whatever reason (say
> you create a table with a column of type xml) and then create a plperlu
> function that "use XML::LibXML", we get a segmentation fault.
> 
> This sequence reproduces the problem for me in 8.3:
> 
> create table xmlcrash (a xml);
> insert into xmlcrash values ('<a />');
> create function xmlcrash() returns void language plperlu as $$ use XML::LibXML; $$;
> 
> The problem is reported as
> 
> TRAP: BadArgument(?!(((context) != ((void *)0) && (((((Node*)((context)))->type) == T_AllocSetContext))))?, Archivo:
?/pgsql/source/83_rel/src/backend/utils/mmgr/mcxt.c?,L?nea: 507)
 
> 
> 
> -- 
> Alvaro Herrera                                http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 


Re: libxml incompatibility

From
Andrew Dunstan
Date:

Alvaro Herrera wrote:
> Hi,
>
> It seems that if you load libxml into a backend for whatever reason (say
> you create a table with a column of type xml) and then create a plperlu
> function that "use XML::LibXML", we get a segmentation fault.
>
>
>   

Yes, I discovered this a few weeks ago. It looks like libxml is not 
reentrant, so for perl you need to use some other XML library. Very 
annoying.

cheers

andrew


Re: libxml incompatibility

From
Kenneth Marshall
Date:
On Fri, Mar 06, 2009 at 02:58:30PM -0500, Andrew Dunstan wrote:
>
>
> Alvaro Herrera wrote:
>> Hi,
>>
>> It seems that if you load libxml into a backend for whatever reason (say
>> you create a table with a column of type xml) and then create a plperlu
>> function that "use XML::LibXML", we get a segmentation fault.
>>
>>
>>   
>
> Yes, I discovered this a few weeks ago. It looks like libxml is not 
> reentrant, so for perl you need to use some other XML library. Very 
> annoying.
>
> cheers
>
> andrew
>
Ugh! That is worse than a simple library link incompatibility.

Ken


Re: libxml incompatibility

From
Alvaro Herrera
Date:
Kenneth Marshall wrote:
> This looks like a problem caused by two different libxml versions:
> the one used for the perl XML::LibXML wrappers and the one used to
> build PostgreSQL. They really need to be the same. Does it still
> segfault if they are identical?

Unlikely, because AFAICT there's a single libxml installed on my system.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: libxml incompatibility

From
Kenneth Marshall
Date:
On Fri, Mar 06, 2009 at 05:23:45PM -0300, Alvaro Herrera wrote:
> Kenneth Marshall wrote:
> > This looks like a problem caused by two different libxml versions:
> > the one used for the perl XML::LibXML wrappers and the one used to
> > build PostgreSQL. They really need to be the same. Does it still
> > segfault if they are identical?
> 
> Unlikely, because AFAICT there's a single libxml installed on my system.
> 
Yes, I saw Andrew's comment and I have had that problem my self with
Apache/PHP and perl with libxml. As simple library mismatch would at
least be easy to resolve.  :)

Regards,
Ken


Re: libxml incompatibility

From
Alvaro Herrera
Date:
Kenneth Marshall wrote:
> On Fri, Mar 06, 2009 at 05:23:45PM -0300, Alvaro Herrera wrote:
> > Kenneth Marshall wrote:
> > > This looks like a problem caused by two different libxml versions:
> > > the one used for the perl XML::LibXML wrappers and the one used to
> > > build PostgreSQL. They really need to be the same. Does it still
> > > segfault if they are identical?
> > 
> > Unlikely, because AFAICT there's a single libxml installed on my system.
> > 
> Yes, I saw Andrew's comment and I have had that problem my self with
> Apache/PHP and perl with libxml. As simple library mismatch would at
> least be easy to resolve.  :)

Agreed :-(

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: libxml incompatibility

From
"Holger Hoffstaette"
Date:
On Fri, 06 Mar 2009 14:32:25 -0600, Kenneth Marshall wrote:

> On Fri, Mar 06, 2009 at 02:58:30PM -0500, Andrew Dunstan wrote:
>> Yes, I discovered this a few weeks ago. It looks like libxml is not
>> reentrant, so for perl you need to use some other XML library. Very
>> annoying.
>>
> Ugh! That is worse than a simple library link incompatibility.

http://www.nabble.com/New-libxml-which-is-reentrant---to18329452.html

Seems to me that Perl (?) is calling functions it is not supposed to call
- I'm guessing due to assumptions about mismatching lifecycles. The
parsing functions themselves are supposedly reentrant.

-h




Re: libxml incompatibility

From
Andrew Dunstan
Date:

Holger Hoffstaette wrote:
> On Fri, 06 Mar 2009 14:32:25 -0600, Kenneth Marshall wrote:
>
>   
>> On Fri, Mar 06, 2009 at 02:58:30PM -0500, Andrew Dunstan wrote:
>>     
>>> Yes, I discovered this a few weeks ago. It looks like libxml is not
>>> reentrant, so for perl you need to use some other XML library. Very
>>> annoying.
>>>
>>>       
>> Ugh! That is worse than a simple library link incompatibility.
>>     
>
> http://www.nabble.com/New-libxml-which-is-reentrant---to18329452.html
>
> Seems to me that Perl (?) is calling functions it is not supposed to call
> - I'm guessing due to assumptions about mismatching lifecycles. The
> parsing functions themselves are supposedly reentrant.
>
>
>   

Maybe someone can trace the libxml calls ... not sure how exactly ... 
given Alvaro's example, it doesn't seem likely to me that this is due to 
a call to xmlCleanupParser(), but maybe the perl code invokes by simply 
doing "use XML::LibXML;" calls that for some perverse reason.

My interest wasn't so high that I wanted to spend a lot of time on it. 
If it didn't work I was just going to move on.

cheers

andrew


Re: libxml incompatibility

From
Alvaro Herrera
Date:
Andrew Dunstan wrote:
>
> Holger Hoffstaette wrote:
>
>> http://www.nabble.com/New-libxml-which-is-reentrant---to18329452.html
>>
>> Seems to me that Perl (?) is calling functions it is not supposed to call
>> - I'm guessing due to assumptions about mismatching lifecycles. The
>> parsing functions themselves are supposedly reentrant.
>
> Maybe someone can trace the libxml calls ... not sure how exactly ...  
> given Alvaro's example, it doesn't seem likely to me that this is due to  
> a call to xmlCleanupParser(), but maybe the perl code invokes by simply  
> doing "use XML::LibXML;" calls that for some perverse reason.

Something that came to my mind was that maybe the change of memory
management (to make it use palloc) could be confusing libxml somehow.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: libxml incompatibility

From
Andrew Dunstan
Date:

Alvaro Herrera wrote:
> Andrew Dunstan wrote:
>   
>> Holger Hoffstaette wrote:
>>
>>     
>>> http://www.nabble.com/New-libxml-which-is-reentrant---to18329452.html
>>>
>>> Seems to me that Perl (?) is calling functions it is not supposed to call
>>> - I'm guessing due to assumptions about mismatching lifecycles. The
>>> parsing functions themselves are supposedly reentrant.
>>>       
>> Maybe someone can trace the libxml calls ... not sure how exactly ...  
>> given Alvaro's example, it doesn't seem likely to me that this is due to  
>> a call to xmlCleanupParser(), but maybe the perl code invokes by simply  
>> doing "use XML::LibXML;" calls that for some perverse reason.
>>     
>
> Something that came to my mind was that maybe the change of memory
> management (to make it use palloc) could be confusing libxml somehow.
>
>   


Seems very possible. But what would perl be doing just as a result of 
loading the module, not even doing anything, that would cause a segfault 
because of that?

cheers

andrew


Re: libxml incompatibility

From
David Lee Lambert
Date:
On 6 mar, 22:44, and...@dunslane.net (Andrew Dunstan) wrote:
> Holger Hoffstaette wrote:
> > On Fri, 06 Mar 2009 14:32:25 -0600, Kenneth Marshall wrote:
> >> On Fri, Mar 06, 2009 at 02:58:30PM -0500, Andrew Dunstan wrote:
> >>> Yes, I discovered this a few weeks ago. [...]
>
> Maybe someone can trace the libxml calls ... not sure how exactly ...
> given Alvaro's example, it doesn't seem likely to me that this is due to
> a call to xmlCleanupParser(), but maybe the perl code invokes by simply
> doing "use XML::LibXML;" calls that for some perverse reason.

I'm able to duplicate this on Postgres 8.4 (Debian Etch, XML::LibXML
from CPAN).  Here's the backtrace from the crash:

#0  0x082f3cf1 in MemoryContextAlloc ()
#1  0x082c3f8a in xml_palloc ()
#2  0xb7dfa548 in xmlInitCharEncodingHandlers () from /usr/lib/
libxml2.so.2
#3  0xb7e0195e in xmlInitParser () from /usr/lib/libxml2.so.2
#4  0xb7dff2ef in xmlCheckVersion () from /usr/lib/libxml2.so.2
#5  0xb573af2e in boot_XML__LibXML ()  from /usr/local/lib/perl/5.8.8/auto/XML/LibXML/LibXML.so
#6  0xb587981b in Perl_pp_entersub () from /usr/lib/libperl.so.5.8
#7  0xb5877f19 in Perl_runops_standard () from /usr/lib/libperl.so.5.8
#8  0xb5819b6e in Perl_magicname () from /usr/lib/libperl.so.5.8
#9  0xb581a844 in Perl_call_sv () from /usr/lib/libperl.so.5.8
...

Is it supposed to be OK to call xmlCheckVersion() more than once?

--
DLL


Re: libxml incompatibility

From
Andrew Dunstan
Date:

David Lee Lambert wrote:
> On 6 mar, 22:44, and...@dunslane.net (Andrew Dunstan) wrote:
>   
>> Holger Hoffstaette wrote:
>>     
>>> On Fri, 06 Mar 2009 14:32:25 -0600, Kenneth Marshall wrote:
>>>       
>>>> On Fri, Mar 06, 2009 at 02:58:30PM -0500, Andrew Dunstan wrote:
>>>>         
>>>>> Yes, I discovered this a few weeks ago. [...]
>>>>>           
>> Maybe someone can trace the libxml calls ... not sure how exactly ...
>> given Alvaro's example, it doesn't seem likely to me that this is due to
>> a call to xmlCleanupParser(), but maybe the perl code invokes by simply
>> doing "use XML::LibXML;" calls that for some perverse reason.
>>     
>
> I'm able to duplicate this on Postgres 8.4 (Debian Etch, XML::LibXML
> from CPAN).  Here's the backtrace from the crash:
>
> #0  0x082f3cf1 in MemoryContextAlloc ()
> #1  0x082c3f8a in xml_palloc ()
> #2  0xb7dfa548 in xmlInitCharEncodingHandlers () from /usr/lib/
> libxml2.so.2
> #3  0xb7e0195e in xmlInitParser () from /usr/lib/libxml2.so.2
> #4  0xb7dff2ef in xmlCheckVersion () from /usr/lib/libxml2.so.2
> #5  0xb573af2e in boot_XML__LibXML ()
>    from /usr/local/lib/perl/5.8.8/auto/XML/LibXML/LibXML.so
> #6  0xb587981b in Perl_pp_entersub () from /usr/lib/libperl.so.5.8
> #7  0xb5877f19 in Perl_runops_standard () from /usr/lib/libperl.so.5.8
> #8  0xb5819b6e in Perl_magicname () from /usr/lib/libperl.so.5.8
> #9  0xb581a844 in Perl_call_sv () from /usr/lib/libperl.so.5.8
> ...
>
> Is it supposed to be OK to call xmlCheckVersion() more than once?
>
>
>   

You are certainly not supposed to call xmlInitParser more than once - 
see <http://xmlsoft.org/html/libxml-parser.html#xmlInitParser>

Since this is being called by xmlCheckVersion(), that looks like a bug 
in libxml2.

Even if this were fixed, however, I'm still not convinced that we'll be 
able to call libxml2 from perl after we've installed our memory handler 
(xml_palloc).

cheers

andrew


Re: libxml incompatibility

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> David Lee Lambert wrote:
>> Is it supposed to be OK to call xmlCheckVersion() more than once?

> You are certainly not supposed to call xmlInitParser more than once - 
> see <http://xmlsoft.org/html/libxml-parser.html#xmlInitParser>

No, what that says is that it can't be called concurrently by more
than one thread.  If there were such a restriction then our own code
wouldn't work at all, because we call it every time through xml_parse()
or xpath().

> Even if this were fixed, however, I'm still not convinced that we'll be 
> able to call libxml2 from perl after we've installed our memory handler 
> (xml_palloc).

Yeah, I'm wondering about that too.  It certainly wouldn't have the
behavior that perl is expecting.

We could possibly use xmlMemGet() to fetch the prior settings and then
restore them after we are done, but making sure that happens after an
error would be a bit tricky.
        regards, tom lane


Re: libxml incompatibility

From
Tom Lane
Date:
I wrote:
> We could possibly use xmlMemGet() to fetch the prior settings and then
> restore them after we are done, but making sure that happens after an
> error would be a bit tricky.

I experimented with this a bit, and came up with the attached patch.
Basically what it does is revert libxml to its native memory management
methods anytime LibxmlContext doesn't exist.  It fixes Alvaro's original
test case and some variants that I stumbled across, but I can't say that
I have a lot of faith in it.  I see at least a couple of risk factors:

* it doesn't scale to the case where some other code is doing the same
kind of thing --- the pointers we saved during xml_init might or might
not still be appropriate to restore at end of transaction.

* suppose that a plperl function does some Perlish XML stuff, then calls
a SQL function that calls something in xml.c.  When we start up use of
LibxmlContext we'll wipe the internal state of libxml (which we *have*
to do; this still crashes trivially without the added xmlCleanupParser
call).  Can this break anything that the perl XML code is expecting to
still be valid when control gets back to it?

If this doesn't work then I'm afraid we'll need some radical rethinking
of the way we handle libxml memory management...

Please test.  I'm not much with either Perl or XML and have little
idea of how to stress this.

            regards, tom lane

Index: src/backend/utils/adt/xml.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/xml.c,v
retrieving revision 1.83
diff -c -r1.83 xml.c
*** src/backend/utils/adt/xml.c    7 Jan 2009 13:44:37 -0000    1.83
--- src/backend/utils/adt/xml.c    22 Mar 2009 03:00:34 -0000
***************
*** 40,45 ****
--- 40,49 ----
   * not very good about specifying this, but for now we assume that
   * xmlCleanupParser() will get rid of anything we need to worry about.
   *
+  * libxml's original memory management callbacks are saved when we create
+  * LibxmlContext, and restored when we delete it.  This is so that there
+  * is some hope for other code (eg, plperl) to use libxml without crashing.
+  *
   * We use palloc --- which will throw a longjmp on error --- for allocation
   * callbacks that officially should act like malloc, ie, return NULL on
   * out-of-memory.  This is a bit risky since there is a chance of leaving
***************
*** 93,98 ****
--- 97,106 ----

  static StringInfo xml_err_buf = NULL;
  static MemoryContext LibxmlContext = NULL;
+ static xmlFreeFunc libxml_freeFunc = NULL;
+ static xmlMallocFunc libxml_mallocFunc = NULL;
+ static xmlReallocFunc libxml_reallocFunc = NULL;
+ static xmlStrdupFunc libxml_strdupFunc = NULL;

  static void xml_init(void);
  static void xml_memory_init(void);
***************
*** 1224,1237 ****
       * sure it doesn't go away before we've called xmlCleanupParser().
       */
      if (LibxmlContext == NULL)
          LibxmlContext = AllocSetContextCreate(TopMemoryContext,
                                                "LibxmlContext",
                                                ALLOCSET_DEFAULT_MINSIZE,
                                                ALLOCSET_DEFAULT_INITSIZE,
                                                ALLOCSET_DEFAULT_MAXSIZE);

!     /* Re-establish the callbacks even if already set */
!     xmlMemSetup(xml_pfree, xml_palloc, xml_repalloc, xml_pstrdup);
  }

  static void
--- 1232,1260 ----
       * sure it doesn't go away before we've called xmlCleanupParser().
       */
      if (LibxmlContext == NULL)
+     {
+         /*
+          * First, run xmlCleanupParser() to get rid of any libxml data
+          * structures that exist now.  If there are any, they were created
+          * with the native memory-management functions and will cause big
+          * trouble if they're touched using our functions.
+          */
+         xmlCleanupParser();
+
+         /* Next, save away libxml's native memory-management functions */
+         xmlMemGet(&libxml_freeFunc, &libxml_mallocFunc,
+                   &libxml_reallocFunc, &libxml_strdupFunc);
+
+         /* Create the context (note this could fail) */
          LibxmlContext = AllocSetContextCreate(TopMemoryContext,
                                                "LibxmlContext",
                                                ALLOCSET_DEFAULT_MINSIZE,
                                                ALLOCSET_DEFAULT_INITSIZE,
                                                ALLOCSET_DEFAULT_MAXSIZE);

!         /* Establish our memory management callbacks */
!         xmlMemSetup(xml_pfree, xml_palloc, xml_repalloc, xml_pstrdup);
!     }
  }

  static void
***************
*** 1242,1247 ****
--- 1265,1274 ----
          /* Give libxml a chance to clean up dangling pointers */
          xmlCleanupParser();

+         /* Restore native memory-management functions */
+         xmlMemSetup(libxml_freeFunc, libxml_mallocFunc,
+                     libxml_reallocFunc, libxml_strdupFunc);
+
          /* And flush the context */
          MemoryContextDelete(LibxmlContext);
          LibxmlContext = NULL;

Re: libxml incompatibility

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> It seems that if you load libxml into a backend for whatever reason (say
> you create a table with a column of type xml) and then create a plperlu
> function that "use XML::LibXML", we get a segmentation fault.

I've applied a patch for this in HEAD.  It fixes the reported case,
but since I'm not a big user of either Perl or XML, it would be good
to get some more testing done ...
        regards, tom lane