Re: Uh-oh: documentation PDF output no longer builds in HEAD - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Uh-oh: documentation PDF output no longer builds in HEAD
Date
Msg-id 10112.1447116397@sss.pgh.pa.us
Whole thread Raw
In response to Re: Uh-oh: documentation PDF output no longer builds in HEAD  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Uh-oh: documentation PDF output no longer builds in HEAD  (Andres Freund <andres@anarazel.de>)
Re: Uh-oh: documentation PDF output no longer builds in HEAD  (Robert Haas <robertmhaas@gmail.com>)
Re: Uh-oh: documentation PDF output no longer builds in HEAD  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers
I wrote:
> Curiously though, that gets us down to this:
>  30615 strings out of 245828
>  397721 string characters out of 1810780
> which implies that indeed FlowObjectSetup *is* the cause of most of
> the strings being entered.  I'm not sure how that squares with the
> observation that there are less than 5000 \pagelabel entries in the
> postgres-US.aux file.  Time for more digging.

Well, after much digging, I've found what seems a workable answer.
It turns out that the original form of FlowObjectSetup is just
unbelievably awful when it comes to handling of hyperlink anchors:
it will put a hyperlink anchor into the PDF for every "flow object",
that is, everything in the document that could possibly have a link
to it, whether or not it actually is linked to.  And aside from bloating
the PDF file, it turns out that the hyperlink stuff also consumes some
control sequence names, which is why we're running out of strings.

There already is logic (probably way older than the hyperlink code)
in jadetex to avoid generating page-number labels for objects that have
no cross-references.  So what I did to fix this was to piggyback on
that code: with the attached jadetex.cfg, both a page-number label
and a hyperlink anchor will be generated for all and only those flow
objects that have either a page-number reference or a hyperlink reference.
(We could try to separate those things, but then we'd need two control
sequence names not one per object for tracking purposes, and anyway many
objects will have both kinds of reference if they have either.)

This gets us down to ~135000 strings to build HEAD, and not incidentally,
the resulting PDF is about half the size it was before.  I think I've
also fixed a number of formerly unexplainable broken hyperlinks in the
PDF; some are still broken, but they were that way before.  (It looks
like <xref> with endterm doesn't work very well in jadetex; all the
remaining bad links seem to be associated with uses of that.)

Barring objection I'll commit this tomorrow.  I'm inclined to back-patch
it at least into 9.5, maybe further, because I'm afraid we may be closer
than we realized to exceeding the strings limit in the back branches too.

            regards, tom lane

% doc/src/sgml/jadetex.cfg
%
% This file redefines FlowObjectSetup and some related macros to greatly
% reduce the number of control sequence names created, and also to avoid
% creation of many useless hyperlink anchors in PDF files.
%
% The original coding of FlowObjectSetup defined a control sequence x@LABEL
% for pretty nearly every flow object in the file, whether that object was
% cross-referenced or not.  Worse yet, it created a hyperlink anchor for
% every such object, which not only bloated the output PDF with useless
% anchors but consumed additional control sequence names internally.
%
% To fix, extend PageLabel's already-existing mechanism whereby a p@LABEL
% control sequence is filled in only for labels that are referenced by at
% least one \Pageref call.  We now also fill in p@LABEL for labels that are
% referenced by a \Link.  Then, we can drop x@LABEL entirely, and use
% p@LABEL to control emission of both a hyperlink anchor and a \PageLabel.
% Now, both of those things are emitted for all and only the flow objects
% that have either a hyperlink reference or a page-number reference.
%
% (With a more invasive patch, we could track the need for an anchor and a
% page-number label separately, but that would probably require more control
% sequences than this way does.)
%
%
% In addition to checking p@LABEL not x@LABEL, this version of FlowObjectSetup
% is fixed to clear \Label and \Element whether or not it emits an anchor
% and page label.  Failure to do that seems to explain some pre-existing bugs
% in which certain SGML constructs weren't correctly cross-referenced.
%
\def\FlowObjectSetup#1{%
\ifDoFOBSet
  \ifLabelElements
     \ifx\Label\@empty\let\Label\Element\fi
  \fi
  \ifx\Label\@empty\else
      \expandafter\ifx\csname p@\Label\endcsname\relax
      \else
       \bgroup
         \ifNestedLink
         \else
           \hyper@anchorstart{\Label}\hyper@anchorend
           \PageLabel{\Label}%
         \fi
       \egroup
      \fi
      \let\Label\@empty
      \let\Element\@empty
  \fi
\fi
}
%
% Adjust PageLabel so that the p@NAME control sequence acquires a correct
% value immediately; this seems to be needed to avoid scenarios wherein
% additional TeX runs are needed to reach a stable state of the .aux file.
%
\def\PageLabel#1{%
  \@bsphack
  \expandafter\ifx\csname p@#1\endcsname\relax
  \else
  \protected@write\@auxout{}%
         {\string\pagelabel{#1}{\thepage}}%
  % Ensure the p@NAME control sequence acquires correct value immediately
  \expandafter\xdef\csname p@#1\endcsname{\thepage}%
  \fi
  \@esphack}
%
% In \Link, add code to emit an aux-file entry if the p@NAME sequence isn't
% defined.  Much as in @Setref, this ensures we'll process the referenced
% item correctly on the next TeX run.
%
\def\Link#1{%
  \begingroup
  \SetupICs{#1}%
  \ifx\Label\@empty\let\Label\Element\fi
%  \typeout{Made a Link at \the\inputlineno, to \Label}%
  \hyper@linkstart{\LinkType}{\Label}%
  \NestedLinktrue
  % If p@NAME control sequence isn't defined, emit dummy def to aux file
  % so it will get defined properly on next run, much as in @Setref
  \expandafter\ifx\csname p@\Label\endcsname\relax
    \immediate\write\@mainaux{\string\pagelabel{\Label}{qqq}}%
  \fi
}

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Getting sorted data from foreign server for merge join
Next
From: Andres Freund
Date:
Subject: Re: Uh-oh: documentation PDF output no longer builds in HEAD