On Tue, Nov 10, 2015 at 1:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote: > Curiously though, that gets us down to this: > 30615 strings out of 245828 > 397721 string characters out of 1810780 > which implies that indeed FlowObjectSetup *is* the cause of most of > the strings being entered. I'm not sure how that squares with the > observation that there are less than 5000 \pagelabel entries in the > postgres-US.aux file. Time for more digging.
Well, after much digging, I've found what seems a workable answer. It turns out that the original form of FlowObjectSetup is just unbelievably awful when it comes to handling of hyperlink anchors: it will put a hyperlink anchor into the PDF for every "flow object", that is, everything in the document that could possibly have a link to it, whether or not it actually is linked to. And aside from bloating the PDF file, it turns out that the hyperlink stuff also consumes some control sequence names, which is why we're running out of strings.
There already is logic (probably way older than the hyperlink code) in jadetex to avoid generating page-number labels for objects that have no cross-references. So what I did to fix this was to piggyback on that code: with the attached jadetex.cfg, both a page-number label and a hyperlink anchor will be generated for all and only those flow objects that have either a page-number reference or a hyperlink reference. (We could try to separate those things, but then we'd need two control sequence names not one per object for tracking purposes, and anyway many objects will have both kinds of reference if they have either.)
This gets us down to ~135000 strings to build HEAD, and not incidentally, the resulting PDF is about half the size it was before. I think I've also fixed a number of formerly unexplainable broken hyperlinks in the PDF; some are still broken, but they were that way before. (It looks like <xref> with endterm doesn't work very well in jadetex; all the remaining bad links seem to be associated with uses of that.)
Barring objection I'll commit this tomorrow. I'm inclined to back-patch it at least into 9.5, maybe further, because I'm afraid we may be closer than we realized to exceeding the strings limit in the back branches too.
Impressive, indeed.
When you say it's half the size - is that half the size of the preprocessed PDF or is it also after the stuff we do on the website PDFs using jpdftweak? IIRC that tweak is only there to deal with the size, and specifically it deals with "bookmarks" which sounds a lot like this...