Thread: Fixing the libxml memory allocation situation

Fixing the libxml memory allocation situation

From
Tom Lane
Date:
I've been poking at the problems described here:
http://archives.postgresql.org/message-id/20090306191404.GK3901@alvh.no-ip.org
http://archives.postgresql.org/message-id/5265.1240417987@sss.pgh.pa.us

I've about come to the conclusion that the only real fix is to abandon
our attempt to manage libxml's memory usage.  Although clearing out
whatever it's allocated at transaction end is safe from Postgres' own
point of view, we cannot really assume that third-party code that
also depends on libxml will think the same.  What we have to do instead
is let it use malloc() directly and add PG_TRY() hacks as necessary to
ensure that transient data structures get cleared on error exits.

This has a number of disadvantages of course:

1. We are exposed to any internal memory leak or failure-to-check-for-null-
return-from-malloc bugs that may lurk inside libxml.  However, it's hard
to argue that introducing known bugs of our own is a good tradeoff for
hiding someone else's possible bugs.  We should take the attitude that
a libxml bug is a libxml bug, not our problem.

2. Any cleanup we miss making will result in a long-term memory leak
that accumulates across transactions.  However, any such cleanup that
we are missing already represents an intratransaction leak, which will
build up with just as much vigor if the XML functions are called
repeatedly in one transaction.  So it's not clear that the current
solution is buying much on this score either.

3. The code in xml.c will be a lot uglier :-(.  We have a prototype,
which is the coding that was used in xml.c during most of 8.3 beta.
It's not real fun but at least the changes are localized.

This is a sufficiently large change that I'd not risk back-porting it
into 8.3, but I think it is the way to proceed for 8.4 and beyond.

Objections, better ideas?
        regards, tom lane


Re: Fixing the libxml memory allocation situation

From
Martijn van Oosterhout
Date:
On Tue, May 12, 2009 at 05:55:13PM -0400, Tom Lane wrote:
> I've been poking at the problems described here:
> http://archives.postgresql.org/message-id/20090306191404.GK3901@alvh.no-ip.org
> http://archives.postgresql.org/message-id/5265.1240417987@sss.pgh.pa.us
>
> I've about come to the conclusion that the only real fix is to abandon
> our attempt to manage libxml's memory usage.  Although clearing out
> whatever it's allocated at transaction end is safe from Postgres' own
> point of view, we cannot really assume that third-party code that
> also depends on libxml will think the same.  What we have to do instead
> is let it use malloc() directly and add PG_TRY() hacks as necessary to
> ensure that transient data structures get cleared on error exits.

Perhaps we could suggest to the libxml authors that it would be nice of
the allocation functions could be controlled in a non-global way, so
that different users of the same library can control their own memory
usage without affecting others.

Not a short term solution, but it might help longer term. We certainly
have enough examples to convince them of the problem.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: Fixing the libxml memory allocation situation

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Tue, May 12, 2009 at 05:55:13PM -0400, Tom Lane wrote:
>> I've about come to the conclusion that the only real fix is to abandon
>> our attempt to manage libxml's memory usage.

> Perhaps we could suggest to the libxml authors that it would be nice of
> the allocation functions could be controlled in a non-global way, so
> that different users of the same library can control their own memory
> usage without affecting others.

They probably see the xmlMemSetup hooks as strictly for debugging, not
something for production use.  Something that makes them significantly
more complex might well be a net step backwards for that purpose.

If anyone does try to pursue this with libxml upstream, don't forget to
mention that the error handling callbacks (xmlSetGenericErrorFunc) are
equally poorly designed for multiple callers.
        regards, tom lane