Thread: Fixing the libxml memory allocation situation
I've been poking at the problems described here: http://archives.postgresql.org/message-id/20090306191404.GK3901@alvh.no-ip.org http://archives.postgresql.org/message-id/5265.1240417987@sss.pgh.pa.us I've about come to the conclusion that the only real fix is to abandon our attempt to manage libxml's memory usage. Although clearing out whatever it's allocated at transaction end is safe from Postgres' own point of view, we cannot really assume that third-party code that also depends on libxml will think the same. What we have to do instead is let it use malloc() directly and add PG_TRY() hacks as necessary to ensure that transient data structures get cleared on error exits. This has a number of disadvantages of course: 1. We are exposed to any internal memory leak or failure-to-check-for-null- return-from-malloc bugs that may lurk inside libxml. However, it's hard to argue that introducing known bugs of our own is a good tradeoff for hiding someone else's possible bugs. We should take the attitude that a libxml bug is a libxml bug, not our problem. 2. Any cleanup we miss making will result in a long-term memory leak that accumulates across transactions. However, any such cleanup that we are missing already represents an intratransaction leak, which will build up with just as much vigor if the XML functions are called repeatedly in one transaction. So it's not clear that the current solution is buying much on this score either. 3. The code in xml.c will be a lot uglier :-(. We have a prototype, which is the coding that was used in xml.c during most of 8.3 beta. It's not real fun but at least the changes are localized. This is a sufficiently large change that I'd not risk back-porting it into 8.3, but I think it is the way to proceed for 8.4 and beyond. Objections, better ideas? regards, tom lane
On Tue, May 12, 2009 at 05:55:13PM -0400, Tom Lane wrote: > I've been poking at the problems described here: > http://archives.postgresql.org/message-id/20090306191404.GK3901@alvh.no-ip.org > http://archives.postgresql.org/message-id/5265.1240417987@sss.pgh.pa.us > > I've about come to the conclusion that the only real fix is to abandon > our attempt to manage libxml's memory usage. Although clearing out > whatever it's allocated at transaction end is safe from Postgres' own > point of view, we cannot really assume that third-party code that > also depends on libxml will think the same. What we have to do instead > is let it use malloc() directly and add PG_TRY() hacks as necessary to > ensure that transient data structures get cleared on error exits. Perhaps we could suggest to the libxml authors that it would be nice of the allocation functions could be controlled in a non-global way, so that different users of the same library can control their own memory usage without affecting others. Not a short term solution, but it might help longer term. We certainly have enough examples to convince them of the problem. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Tue, May 12, 2009 at 05:55:13PM -0400, Tom Lane wrote: >> I've about come to the conclusion that the only real fix is to abandon >> our attempt to manage libxml's memory usage. > Perhaps we could suggest to the libxml authors that it would be nice of > the allocation functions could be controlled in a non-global way, so > that different users of the same library can control their own memory > usage without affecting others. They probably see the xmlMemSetup hooks as strictly for debugging, not something for production use. Something that makes them significantly more complex might well be a net step backwards for that purpose. If anyone does try to pursue this with libxml upstream, don't forget to mention that the error handling callbacks (xmlSetGenericErrorFunc) are equally poorly designed for multiple callers. regards, tom lane