Thread: [MASSMAIL]nested tags in glossary entries in html docs
Hello, In REL_13_STABLE and above, generated HTML have a broken HTML: nested <a href="..."> tags for all links to glossary. Somehow, this results in duplicated <a> tags on the https://www.postgresql.org/docs/ Found by tab-navigating https://www.postgresql.org/docs/16/rowtypes.html where we see (spacing added to avoid line wraps): ... create a <a class="glossterm" href="glossary.html#GLOSSARY-DOMAIN"</a><a class="glossterm" href="glossary.html#GLOSSARY-DOMAIN" title="Domain">domain</a> over the composite type ... So, empty <a>, and then the real <a>. This resulted in stopping twice on the "domain" link (right before, and then on the "domain" word itself) while tab-navigating. If I am to "make html" docs from the source (in REL_13_STABLE..master), I see nested <a>'s instead: create a <a class="glossterm" href="glossary.html#GLOSSARY-DOMAIN"><em class="glossterm"><a class="glossterm" href="glossary.html#GLOSSARY-DOMAIN" title="Domain">domain</a></em></a> over the composite type I guess docs are processed additionally before getting to https://www.postgresql.org/docs/ (html sanitizer/beautifier?), in passing fixing nested <a>'s, but producing duplicated <a>'s instead. It seems to affect all glossary links: after "make html", grep '</a></em></a>' doc/src/sgml/html/*.html results in 254 to 329 matches (in versions 13 to master). REL_12_STABLE is not affected: glossary was only introduced in 13. It seems to have been introduced in commit 347d2b07fcc250680f75b5f89ba49d4805782c6b Author: Alvaro Herrera <alvherre@alvh.no-ip.org> Date: Fri Apr 3 19:23:20 2020 Add a glossary to the documentation Not sure about how to fix this (don't really know docbook). -- Anton Voloshin Postgres Professional, The Russian Postgres Company https://postgrespro.ru
On 2024-04-12 18:29 +0200, Anton Voloshin wrote: > In REL_13_STABLE and above, generated HTML have a broken HTML: nested <a > href="..."> tags for all links to glossary. Somehow, this results in > duplicated <a> tags on the https://www.postgresql.org/docs/ > > Found by tab-navigating https://www.postgresql.org/docs/16/rowtypes.html > where we see (spacing added to avoid line wraps): > ... create a <a class="glossterm" > href="glossary.html#GLOSSARY-DOMAIN"</a><a class="glossterm" > href="glossary.html#GLOSSARY-DOMAIN" > title="Domain">domain</a> over the composite type ... > > So, empty <a>, and then the real <a>. This resulted in stopping twice on the > "domain" link (right before, and then on the > "domain" word itself) while tab-navigating. There's this bug[1] in the DocBook XSLT stylesheets. Looks like the fix[2] landed in 1.79.2 (latest version on Arch, matching the latest snapshot on GitHub from 2020-06-03) because I can see the change in /usr/share/xml/docbook/xsl-stylesheets-1.79.2-nons/html/inline.xsl and /usr/share/xml/docbook/xsl-stylesheets-1.79.2-nons/xhtml/inline.xsl. But I still get those nested <a> with a simple test: <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"> <book> <title>Test</title> <glosslist> <glossentry id="glossary-a"> <glossterm>A</glossterm> <glossdef> <para> <glossterm linkend="glossary-b">B</glossterm> </para> </glossdef> </glossentry> <glossentry id="glossary-b"> <glossterm>B</glossterm> <glossdef> <para> Lorem ipsum… </para> </glossdef> </glossentry> </glosslist> </book> Generating the XHTML with xsltproc --nonet /usr/share/xml/docbook/xsl-stylesheets-1.79.2-nons/xhtml/docbook.xsl test.sgml | grep '</a></em></a>' gives me <a class="glossterm" href="#glossary-b"><em class="glossterm"><a class="glossterm" href="#glossary-b" title="B">B</a></em></a> > Not sure about how to fix this (don't really know docbook). My XSLT skills are quite rusty, but maybe it's possible to omit the outer <a class="glossterm"> and just emit <em class="glossterm"> and its child <a> in our stylesheets. [1] https://github.com/docbook/xslt10-stylesheets/issues/24 [2] https://github.com/docbook/xslt10-stylesheets/commit/c242ce2b8c1a5ebfdb2e719f788367bb1ddee8ea -- Erik
On 2024-Apr-12, Erik Wienhold wrote: > There's this bug[1] in the DocBook XSLT stylesheets. Looks like the > fix[2] landed in 1.79.2 (latest version on Arch, Maybe one of these days we should get going with the migration to Docbook 5.x that Jürgen Purtz proposed. https://postgr.es/m/21ed3fd9-9020-4b53-b04f-a08a831b6085@purtz.de In the meantime, if anyone wants to suggest a XSLT patch to carry in our local definition, we could try that. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "The Gord often wonders why people threaten never to come back after they've been told never to return" (www.actsofgord.com)
On 25.04.24 11:24, Alvaro Herrera wrote:
Great recommendation. I may have time in the second half of June to pursue the migration further. There is a performance problem, which possibly results from our XSLT script that optimizes the transformation-speed and works well in 4.x.On 2024-Apr-12, Erik Wienhold wrote:There's this bug[1] in the DocBook XSLT stylesheets. Looks like the fix[2] landed in 1.79.2 (latest version on Arch,Maybe one of these days we should get going with the migration to Docbook 5.x that Jürgen Purtz proposed. https://postgr.es/m/21ed3fd9-9020-4b53-b04f-a08a831b6085@purtz.de In the meantime, if anyone wants to suggest a XSLT patch to carry in our local definition, we could try that.
Jürgen Purtz
@Anton: AFAIK the script was developed by your (former?) college Alexander Lakhin.
On 25.04.24 13:24, Jürgen Purtz wrote:
... or do we have a problem with the fact that our xml files are not well-formed? Some of them contain more than one root-element:On 25.04.24 11:24, Alvaro Herrera wrote:Great recommendation. I may have time in the second half of June to pursue the migration further. There is a performance problem, which possibly results from our XSLT script that optimizes the transformation-speed and works well in 4.x.On 2024-Apr-12, Erik Wienhold wrote:There's this bug[1] in the DocBook XSLT stylesheets. Looks like the fix[2] landed in 1.79.2 (latest version on Arch,Maybe one of these days we should get going with the migration to Docbook 5.x that Jürgen Purtz proposed. https://postgr.es/m/21ed3fd9-9020-4b53-b04f-a08a831b6085@purtz.de In the meantime, if anyone wants to suggest a XSLT patch to carry in our local definition, we could try that.
Jürgen Purtz
@Anton: AFAIK the script was developed by your (former?) college Alexander Lakhin.
xmllint --noout *.sgml ref/*.sgml 2> >(grep Extra)
J. Purtz
On 2024-04-25 15:40 +0200, Jürgen Purtz wrote: > On 25.04.24 13:24, Jürgen Purtz wrote: > > > > On 25.04.24 11:24, Alvaro Herrera wrote: > > > On 2024-Apr-12, Erik Wienhold wrote: > > > > > > > There's this bug[1] in the DocBook XSLT stylesheets. Looks like the > > > > fix[2] landed in 1.79.2 (latest version on Arch, > > > Maybe one of these days we should get going with the migration to > > > Docbook 5.x that Jürgen Purtz proposed. > > > > > > https://postgr.es/m/21ed3fd9-9020-4b53-b04f-a08a831b6085@purtz.de > > > > > > In the meantime, if anyone wants to suggest a XSLT patch to carry in our > > > local definition, we could try that. > > > > > Great recommendation. I may have time in the second half of June to > > pursue the migration further. There is a performance problem, which > > possibly results from our XSLT script that optimizes the > > transformation-speed and works well in 4.x. > > > ... or do we have a problem with the fact that our xml files are not > well-formed? Some of them contain more than one root-element: > > xmllint --noout *.sgml ref/*.sgml 2> >(grep Extra) No, those files are not processed as standalone documents but are transcluded into postgres-full.xml from which postgres.html is then generated. And postgres-full.xml is well-formed according to xmllint. -- Erik
On 2024-Apr-25, Jürgen Purtz wrote: > Great recommendation. I may have time in the second half of June to > pursue the migration further. There is a performance problem, which > possibly results from our XSLT script that optimizes the > transformation-speed and works well in 4.x. Maybe a way to study this is to time a run with those speedups removed and see if the timing with DocBook 5.2 matches. If it does, that's a sign that forward-porting the speedup tweaks may be worthwhile. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "On the other flipper, one wrong move and we're Fatal Exceptions" (T.U.X.: Term Unit X - http://www.thelinuxreview.com/TUX/)
Hello, 25.04.2024 12:24, Alvaro Herrera wrote: > On 2024-Apr-12, Erik Wienhold wrote: > >> There's this bug[1] in the DocBook XSLT stylesheets. Looks like the >> fix[2] landed in 1.79.2 (latest version on Arch, > Maybe one of these days we should get going with the migration to > Docbook 5.x that Jürgen Purtz proposed. > > https://postgr.es/m/21ed3fd9-9020-4b53-b04f-a08a831b6085@purtz.de > > In the meantime, if anyone wants to suggest a XSLT patch to carry in our > local definition, we could try that. Please try the attached patch, which adds <xsl:template match="glossterm" name="glossterm">, borrowed from /usr/share/xml/docbook/stylesheet/ docbook-xsl/xhtml/inline.xsl (I have docbook-xsl 1.79.2 installed), to our local stylesheet-html-common.xsl. I applied the modification from [1] (in two places) and it looks like the nested <a> issue is gone. [1] https://github.com/docbook/xslt10-stylesheets/pull/72/commits/62144252364492aecd71a3c8d5e6e1624af84785 Best regards, Alexander
Attachment
On 2024-04-25 22:00 +0200, Alexander Lakhin wrote: > 25.04.2024 12:24, Alvaro Herrera wrote: > > On 2024-Apr-12, Erik Wienhold wrote: > > > > > There's this bug[1] in the DocBook XSLT stylesheets. Looks like the > > > fix[2] landed in 1.79.2 (latest version on Arch, > > Maybe one of these days we should get going with the migration to > > Docbook 5.x that Jürgen Purtz proposed. > > > > https://postgr.es/m/21ed3fd9-9020-4b53-b04f-a08a831b6085@purtz.de > > > > In the meantime, if anyone wants to suggest a XSLT patch to carry in our > > local definition, we could try that. > > Please try the attached patch, which adds <xsl:template match="glossterm" > name="glossterm">, borrowed from /usr/share/xml/docbook/stylesheet/ > docbook-xsl/xhtml/inline.xsl (I have docbook-xsl 1.79.2 installed), to our > local stylesheet-html-common.xsl. > > I applied the modification from [1] (in two places) and it looks like the > nested <a> issue is gone. > > [1] https://github.com/docbook/xslt10-stylesheets/pull/72/commits/62144252364492aecd71a3c8d5e6e1624af84785 It works. There are already a couple of upstream fixes copied into our stylesheets, with links to bug reports. So I also created one for reference, with my upthread test case: https://github.com/docbook/xslt10-stylesheets/issues/267 -- Erik