Re: Switching to XML - Mailing list pgsql-docs
From | Tom Lane |
---|---|
Subject | Re: Switching to XML |
Date | |
Msg-id | 24388.1166800682@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Switching to XML (David Blewett <david@dawninglight.net>) |
List | pgsql-docs |
David Blewett <david@dawninglight.net> writes: > There's been a question posted to that bug. Would you mind replying > to it? You don't have to create an account there. I can post your reply. I haven't gotten any reply from my posting to the upstream mailing list at sourceforge :-( so it's hard to tell if anything will happen there. I don't really approve of fixing this on a distro-by-distro basis, but maybe we have no other choice. Anyway, what I said to them was ------- Forwarded Message Date: Sat, 09 Dec 2006 23:35:54 -0500 From: Tom Lane <tgl@sss.pgh.pa.us> To: openjade-devel@lists.sourceforge.net Subject: Performance problems with NodeListObj lists It's been folklore for some time at the PostgreSQL project (www.postgresql.org) that running our documentation through jade to produce TeX output requires about three days on current hardware :-( Today I got motivated to look into why, and what I find is a pretty localized problem, as illustrated by this oprofile trace: samples % symbol name 2082917 98.5829 OpenJade_DSSSL::PairNodeListObj::nodeListFirst(OpenJade_DSSSL::EvalContext&, OpenJade_DSSSL::Interpreter&) 9713 0.4597 OpenJade_DSSSL::PairNodeListObj::nodeListRest(OpenJade_DSSSL::EvalContext&, OpenJade_DSSSL::Interpreter&) 5019 0.2375 OpenJade_DSSSL::AppendSosofoObj::traceSubObjects(Collector&) const 3571 0.1690 Collector::collect() 1938 0.0917 OpenJade_DSSSL::FlowObj::traceSubObjects(Collector&) const While I've not traced through the stylesheets to determine exactly what causes this, the behavior seen at the C++ level is that long NodeListObj lists (on the order of 10K elements) are built by successive "append" operations using the NodeList primitive. This produces a PairNodeListObj tree that's nested in the wrong direction; if you'll pardon some crude ASCII art: PairNodeListObj / \ PairNodeListObj e / \ PairNodeListObj d / \ PairNodeListObj c / \ a b This means that for an N-element list, nodeListFirst requires O(N) time to recurse down to the first element, and so does nodeListRest. So iterating through the list in the usual way requires O(N^2) time. To add to the problem, nodeListRest generates another PairNodeListObj on each call, imposing a ton of mostly-useless load on the memory allocation/garbage collection machinery. Attached is a first-cut attempt at fixing this. With this patch, jade gets through the Postgres doc sources in 5 minutes rather than the reported three days. (I've never actually built the sources on the machine I'm testing on, so these numbers are not strictly comparable.) As patched, PairNodeListObj is used only in ReverseNodeListObj; it's tempting to get rid of it entirely by merging it somehow with the new CellNodeListObj class. But since I've never looked at the jade sources before today, I figured I'd better send it in without any more ado, so someone who knows better can tell me why this is not gonna work ... regards, tom lane [ plus patch, stripped ] ------- End of Forwarded Message The point of the patch is to introduce a version of NodeListObj that tracks both the head and the tail of the list it represents, so that successive appends can produce a tree that's right-deep instead of left-deep. The individual list cells (CellNodeListObj) just have Lisp-style car and cdr pointers, while HeadNodeListObj is the top of the list and remembers the current first and last cells. As for the changes in primitive.cxx, assuming that I have wrapped my head around the usage of ELObjDynamicRoot correctly, the original coding is actually wrong because it invokes PairNodeListObj() without protecting the "tem" pointer. If that were to point to a freshly-created, not-otherwise-referenced object, then if any garbage collection occurred during PairNodeListObj()'s memory allocation request you'd have a problem. I suppose this bug has not been seen because no version of asNodeList() that is actually used here returns a freshly created object, but it sure seems like an accident waiting to happen. So I modified the code to be sure that both nl and tem are ELObjDynamicRoot-protected during the nodeListAppend call. regards, tom lane
pgsql-docs by date: