[MASSMAIL]nested tags in glossary entries in html docs

From
Anton Voloshin
Date:
Hello,

In REL_13_STABLE and above, generated HTML have a broken HTML: nested <a 
href="..."> tags for all links to glossary. Somehow, this results in 
duplicated <a> tags on the https://www.postgresql.org/docs/

Found by tab-navigating https://www.postgresql.org/docs/16/rowtypes.html
where we see (spacing added to avoid line wraps):
... create a <a class="glossterm"
href="glossary.html#GLOSSARY-DOMAIN"</a><a class="glossterm" 
href="glossary.html#GLOSSARY-DOMAIN"
title="Domain">domain</a> over the composite type ...

So, empty <a>, and then the real <a>. This resulted in stopping twice on 
the "domain" link (right before, and then on the
"domain" word itself) while tab-navigating.

If I am to "make html" docs from the source (in REL_13_STABLE..master), 
I see nested <a>'s instead:

create a <a class="glossterm" href="glossary.html#GLOSSARY-DOMAIN"><em
class="glossterm"><a class="glossterm" 
href="glossary.html#GLOSSARY-DOMAIN" title="Domain">domain</a></em></a> 
over the composite type

I guess docs are processed additionally before getting to
https://www.postgresql.org/docs/ (html sanitizer/beautifier?), in 
passing fixing nested <a>'s, but producing duplicated <a>'s instead.

It seems to affect all glossary links:
after "make html",
grep '</a></em></a>' doc/src/sgml/html/*.html
results in 254 to 329 matches (in versions 13 to master).

REL_12_STABLE is not affected: glossary was only introduced in 13.

It seems to have been introduced in

commit 347d2b07fcc250680f75b5f89ba49d4805782c6b
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   Fri Apr 3 19:23:20 2020

     Add a glossary to the documentation

Not sure about how to fix this (don't really know docbook).

-- 
Anton Voloshin
Postgres Professional, The Russian Postgres Company
https://postgrespro.ru



Re: nested tags in glossary entries in html docs

From
Erik Wienhold
Date:
On 2024-04-12 18:29 +0200, Anton Voloshin wrote:
> In REL_13_STABLE and above, generated HTML have a broken HTML: nested <a
> href="..."> tags for all links to glossary. Somehow, this results in
> duplicated <a> tags on the https://www.postgresql.org/docs/
> 
> Found by tab-navigating https://www.postgresql.org/docs/16/rowtypes.html
> where we see (spacing added to avoid line wraps):
> ... create a <a class="glossterm"
> href="glossary.html#GLOSSARY-DOMAIN"</a><a class="glossterm"
> href="glossary.html#GLOSSARY-DOMAIN"
> title="Domain">domain</a> over the composite type ...
> 
> So, empty <a>, and then the real <a>. This resulted in stopping twice on the
> "domain" link (right before, and then on the
> "domain" word itself) while tab-navigating.

There's this bug[1] in the DocBook XSLT stylesheets.  Looks like the
fix[2] landed in 1.79.2 (latest version on Arch, matching the latest
snapshot on GitHub from 2020-06-03) because I can see the change in
/usr/share/xml/docbook/xsl-stylesheets-1.79.2-nons/html/inline.xsl and
/usr/share/xml/docbook/xsl-stylesheets-1.79.2-nons/xhtml/inline.xsl.
But I still get those nested <a> with a simple test:

    <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
              "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
    <book>
     <title>Test</title>
     <glosslist>
      <glossentry id="glossary-a">
       <glossterm>A</glossterm>
       <glossdef>
        <para>
         <glossterm linkend="glossary-b">B</glossterm>
        </para>
       </glossdef>
      </glossentry>
      <glossentry id="glossary-b">
       <glossterm>B</glossterm>
       <glossdef>
        <para>
         Lorem ipsum…
        </para>
       </glossdef>
      </glossentry>
     </glosslist>
    </book>

Generating the XHTML with

    xsltproc --nonet /usr/share/xml/docbook/xsl-stylesheets-1.79.2-nons/xhtml/docbook.xsl test.sgml | grep
'</a></em></a>'

gives me

    <a class="glossterm" href="#glossary-b"><em class="glossterm"><a class="glossterm" href="#glossary-b"
title="B">B</a></em></a>

> Not sure about how to fix this (don't really know docbook).

My XSLT skills are quite rusty, but maybe it's possible to omit the
outer <a class="glossterm"> and just emit <em class="glossterm"> and its
child <a> in our stylesheets.

[1] https://github.com/docbook/xslt10-stylesheets/issues/24
[2] https://github.com/docbook/xslt10-stylesheets/commit/c242ce2b8c1a5ebfdb2e719f788367bb1ddee8ea

-- 
Erik



Re: nested tags in glossary entries in html docs

From
Alvaro Herrera
Date:
On 2024-Apr-12, Erik Wienhold wrote:

> There's this bug[1] in the DocBook XSLT stylesheets.  Looks like the
> fix[2] landed in 1.79.2 (latest version on Arch,

Maybe one of these days we should get going with the migration to
Docbook 5.x that Jürgen Purtz proposed.

https://postgr.es/m/21ed3fd9-9020-4b53-b04f-a08a831b6085@purtz.de

In the meantime, if anyone wants to suggest a XSLT patch to carry in our
local definition, we could try that.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"The Gord often wonders why people threaten never to come back after they've
been told never to return" (www.actsofgord.com)



Re: nested tags in glossary entries in html docs

From
Jürgen Purtz
Date:

On 25.04.24 11:24, Alvaro Herrera wrote:
On 2024-Apr-12, Erik Wienhold wrote:

There's this bug[1] in the DocBook XSLT stylesheets.  Looks like the
fix[2] landed in 1.79.2 (latest version on Arch,
Maybe one of these days we should get going with the migration to
Docbook 5.x that Jürgen Purtz proposed.

https://postgr.es/m/21ed3fd9-9020-4b53-b04f-a08a831b6085@purtz.de

In the meantime, if anyone wants to suggest a XSLT patch to carry in our
local definition, we could try that.

Great recommendation. I may have time in the second half of June to pursue the migration further. There is  a performance problem, which possibly results from our XSLT script that optimizes the transformation-speed and works well in 4.x.

Jürgen Purtz

@Anton: AFAIK the script was developed by your (former?) college 
Alexander Lakhin.

Re: nested tags in glossary entries in html docs

From
Jürgen Purtz
Date:


On 25.04.24 13:24, Jürgen Purtz wrote:

On 25.04.24 11:24, Alvaro Herrera wrote:
On 2024-Apr-12, Erik Wienhold wrote:

There's this bug[1] in the DocBook XSLT stylesheets.  Looks like the
fix[2] landed in 1.79.2 (latest version on Arch,
Maybe one of these days we should get going with the migration to
Docbook 5.x that Jürgen Purtz proposed.

https://postgr.es/m/21ed3fd9-9020-4b53-b04f-a08a831b6085@purtz.de

In the meantime, if anyone wants to suggest a XSLT patch to carry in our
local definition, we could try that.

Great recommendation. I may have time in the second half of June to pursue the migration further. There is  a performance problem, which possibly results from our XSLT script that optimizes the transformation-speed and works well in 4.x.

Jürgen Purtz

@Anton: AFAIK the script was developed by your (former?) college 
Alexander Lakhin.

... or do we have a problem with the fact that our xml files are not well-formed? Some of them contain more than one root-element:

xmllint --noout *.sgml ref/*.sgml 2> >(grep Extra)

J. Purtz

Re: nested tags in glossary entries in html docs

From
Erik Wienhold
Date:
On 2024-04-25 15:40 +0200, Jürgen Purtz wrote:
> On 25.04.24 13:24, Jürgen Purtz wrote:
> > 
> > On 25.04.24 11:24, Alvaro Herrera wrote:
> > > On 2024-Apr-12, Erik Wienhold wrote:
> > > 
> > > > There's this bug[1] in the DocBook XSLT stylesheets.  Looks like the
> > > > fix[2] landed in 1.79.2 (latest version on Arch,
> > > Maybe one of these days we should get going with the migration to
> > > Docbook 5.x that Jürgen Purtz proposed.
> > > 
> > > https://postgr.es/m/21ed3fd9-9020-4b53-b04f-a08a831b6085@purtz.de
> > > 
> > > In the meantime, if anyone wants to suggest a XSLT patch to carry in our
> > > local definition, we could try that.
> > > 
> > Great recommendation. I may have time in the second half of June to
> > pursue the migration further. There is  a performance problem, which
> > possibly results from our XSLT script that optimizes the
> > transformation-speed and works well in 4.x.
> > 
> ... or do we have a problem with the fact that our xml files are not
> well-formed? Some of them contain more than one root-element:
> 
> xmllint --noout *.sgml ref/*.sgml 2> >(grep Extra)

No, those files are not processed as standalone documents but are
transcluded into postgres-full.xml from which postgres.html is then
generated.  And postgres-full.xml is well-formed according to xmllint.

-- 
Erik



Re: nested tags in glossary entries in html docs

From
Alvaro Herrera
Date:
On 2024-Apr-25, Jürgen Purtz wrote:

> Great recommendation. I may have time in the second half of June to
> pursue the migration further. There is  a performance problem, which
> possibly results from our XSLT script that optimizes the
> transformation-speed and works well in 4.x.

Maybe a way to study this is to time a run with those speedups removed
and see if the timing with DocBook 5.2 matches.  If it does, that's a
sign that forward-porting the speedup tweaks may be worthwhile.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"On the other flipper, one wrong move and we're Fatal Exceptions"
(T.U.X.: Term Unit X  - http://www.thelinuxreview.com/TUX/)



Re: nested tags in glossary entries in html docs

From
Alexander Lakhin
Date:
Hello,

25.04.2024 12:24, Alvaro Herrera wrote:
> On 2024-Apr-12, Erik Wienhold wrote:
>
>> There's this bug[1] in the DocBook XSLT stylesheets.  Looks like the
>> fix[2] landed in 1.79.2 (latest version on Arch,
> Maybe one of these days we should get going with the migration to
> Docbook 5.x that Jürgen Purtz proposed.
>
> https://postgr.es/m/21ed3fd9-9020-4b53-b04f-a08a831b6085@purtz.de
>
> In the meantime, if anyone wants to suggest a XSLT patch to carry in our
> local definition, we could try that.

Please try the attached patch, which adds <xsl:template match="glossterm"
name="glossterm">, borrowed from /usr/share/xml/docbook/stylesheet/
docbook-xsl/xhtml/inline.xsl (I have docbook-xsl 1.79.2 installed), to our
local stylesheet-html-common.xsl.

I applied the modification from [1] (in two places) and it looks like the
nested <a> issue is gone.

[1] https://github.com/docbook/xslt10-stylesheets/pull/72/commits/62144252364492aecd71a3c8d5e6e1624af84785

Best regards,
Alexander
Attachment

Re: nested tags in glossary entries in html docs

From
Erik Wienhold
Date:
On 2024-04-25 22:00 +0200, Alexander Lakhin wrote:
> 25.04.2024 12:24, Alvaro Herrera wrote:
> > On 2024-Apr-12, Erik Wienhold wrote:
> > 
> > > There's this bug[1] in the DocBook XSLT stylesheets.  Looks like the
> > > fix[2] landed in 1.79.2 (latest version on Arch,
> > Maybe one of these days we should get going with the migration to
> > Docbook 5.x that Jürgen Purtz proposed.
> > 
> > https://postgr.es/m/21ed3fd9-9020-4b53-b04f-a08a831b6085@purtz.de
> > 
> > In the meantime, if anyone wants to suggest a XSLT patch to carry in our
> > local definition, we could try that.
> 
> Please try the attached patch, which adds <xsl:template match="glossterm"
> name="glossterm">, borrowed from /usr/share/xml/docbook/stylesheet/
> docbook-xsl/xhtml/inline.xsl (I have docbook-xsl 1.79.2 installed), to our
> local stylesheet-html-common.xsl.
> 
> I applied the modification from [1] (in two places) and it looks like the
> nested <a> issue is gone.
> 
> [1] https://github.com/docbook/xslt10-stylesheets/pull/72/commits/62144252364492aecd71a3c8d5e6e1624af84785

It works.

There are already a couple of upstream fixes copied into our
stylesheets, with links to bug reports.  So I also created one for
reference, with my upthread test case:

https://github.com/docbook/xslt10-stylesheets/issues/267

-- 
Erik