Re: Uh-oh: documentation PDF output no longer builds in HEAD - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: Uh-oh: documentation PDF output no longer builds in HEAD
Date
Msg-id CABUevEzehq_Aoo2UKiDPSx6sGabK8p7hO+AfA3syCW35AioYUw@mail.gmail.com
Whole thread Raw
In response to Re: Uh-oh: documentation PDF output no longer builds in HEAD  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Uh-oh: documentation PDF output no longer builds in HEAD  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Nov 10, 2015 at 1:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> Curiously though, that gets us down to this:
>  30615 strings out of 245828
>  397721 string characters out of 1810780
> which implies that indeed FlowObjectSetup *is* the cause of most of
> the strings being entered.  I'm not sure how that squares with the
> observation that there are less than 5000 \pagelabel entries in the
> postgres-US.aux file.  Time for more digging.

Well, after much digging, I've found what seems a workable answer.
It turns out that the original form of FlowObjectSetup is just
unbelievably awful when it comes to handling of hyperlink anchors:
it will put a hyperlink anchor into the PDF for every "flow object",
that is, everything in the document that could possibly have a link
to it, whether or not it actually is linked to.  And aside from bloating
the PDF file, it turns out that the hyperlink stuff also consumes some
control sequence names, which is why we're running out of strings.

There already is logic (probably way older than the hyperlink code)
in jadetex to avoid generating page-number labels for objects that have
no cross-references.  So what I did to fix this was to piggyback on
that code: with the attached jadetex.cfg, both a page-number label
and a hyperlink anchor will be generated for all and only those flow
objects that have either a page-number reference or a hyperlink reference.
(We could try to separate those things, but then we'd need two control
sequence names not one per object for tracking purposes, and anyway many
objects will have both kinds of reference if they have either.)

This gets us down to ~135000 strings to build HEAD, and not incidentally,
the resulting PDF is about half the size it was before.  I think I've
also fixed a number of formerly unexplainable broken hyperlinks in the
PDF; some are still broken, but they were that way before.  (It looks
like <xref> with endterm doesn't work very well in jadetex; all the
remaining bad links seem to be associated with uses of that.)

Barring objection I'll commit this tomorrow.  I'm inclined to back-patch
it at least into 9.5, maybe further, because I'm afraid we may be closer
than we realized to exceeding the strings limit in the back branches too.

Impressive, indeed.

When you say it's half the size - is that half the size of the preprocessed PDF or is it also after the stuff we do on the website PDFs using jpdftweak? IIRC that tweak is only there to deal with the size, and specifically it deals with "bookmarks" which sounds a lot like this... 

--

pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Documentation tweak for row-valued expressions and null
Next
From: Jesper Pedersen
Date:
Subject: Re: Move PinBuffer and UnpinBuffer to atomics