Thread: Docbook 5.x

Docbook 5.x

From
Jürgen Purtz
Date:
Hi,
actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an upgrade to DocBook 5.x. This sounds simple, but it will be a long process with many sub-tasks.

Rationale:
  • Sooner or later we MUST migrate as the 4.x series is outdated: V4.2 dates back to 2002. The 4.x series is no longer actively developed since 2006. See: http://www.docbook.org/tdg5/en/html/ch01.html "In October 2006, the DocBook Technical Committee released DocBook V4.5, the last release planned in the 4.x series."
  • V5.0 is available since 2009. See: http://www.docbook.org/tdg5/en/html/ch01.html: "DocBook V5.0 became an official Committee Specification in June 2009 and became an official OASIS Standard in October 2009."
  • Actually the technical committee has the third Candidate Release for V5.1.

PROs:
  • The formal part of the migration is supported by existing tools: http://docbook.org/docs/howto/#convert4to5 (nevertheless some scripts written by ourself will be necessary).
  • The normative schema for Docbook 5.x is written in RELAX NG. Additionally the technical committee converts this normative schema to a XSD schema and to DTD, which are not normative but very near to RELAX NG and will fit for most applications. Hence, we have the choice between three schema syntaxes and everybody can use his favourite one.
  • Our source file format will switch from SGML to XML. This implies that we have access to all XML features like XLink, XPath, XSLT, XSL-FO, SVG, MathML, namespaces, ... .
CONs:
  • The migration from 4.x to 5.x implies major changes at 3 different levels.
    • DocBook structure: Previously it was defined in SGML syntax (DTD). Now it is defined in RELAX NG schema language plus Schematron rules.
    • DocBook files: Previously we used SGML syntax for our files. We must convert them to a valid XML syntax, eg: tag omission.
    • Tools and style sheets: All tools which operate at the native SGML-level (editors, conversions, ...) must be replaced by XML conforming tools. As valid XML implicitly conforms to a valid SGML syntax this step may be accomplished by reconfiguring some of the tools, eg.: .emacs.
What I have done so far is:
  • Conversion of sgml files to valid xml syntax with a perl skript. I failed to use 'osx' or 'spam'.
  • Conversion of these xml files to Docbook5.x format using xsltproc and Docbooks xslt-migration skripts.
  • Creation of html files using xsltproc and Docbooks xslt skripts.
  • Creation of fo files using xsltproc and Docbooks xslt skripts.
  • Creation of pdf files using fop.
  • The conversions needs less than 10 minutes on a Intel i5 processor.
This is a very first raw round-trip with one output file per sgml file and output type. Not supported: entities (__gt__ as a surrogate), <[CDATA and similar SGML constructs, PostgreSQL specific style sheets, Makefile, additional errors occur, .... .  I append one file of every new format for the chapter "Advanced Features": xml (the new source), html, fo, pdf.

Any ideas or suggestions? Shall we go further on this way? Has anybody more experiences in SGML-->XML conversions or Docbook 4.x --> 5.x conversions?

Kind regards
Jürgen Purtz

Attachment

Re: Docbook 5.x

From
Simon Riggs
Date:
On 20 April 2016 at 15:30, Jürgen Purtz <juergen@purtz.de> wrote:
 
What I have done so far is:
  • Conversion of sgml files to valid xml syntax with a perl skript. I failed to use 'osx' or 'spam'.
  • Conversion of these xml files to Docbook5.x format using xsltproc and Docbooks xslt-migration skripts.
  • Creation of html files using xsltproc and Docbooks xslt skripts.
  • Creation of fo files using xsltproc and Docbooks xslt skripts.
  • Creation of pdf files using fop.
  • The conversions needs less than 10 minutes on a Intel i5 processor.
So you believe you have/can convert between the two formats accurately, so we can change things in a single commit?

What verification is offered? Possible?

And that is ready to go now? Will you post your perl script, or the patch? Other projects use the same file formats, e.g. Slony, XL etc

If an automatic migration is possible do we need to change at all?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Docbook 5.x

From
Jürgen Purtz
Date:

On 20.04.2016 20:41, Simon Riggs wrote:
On 20 April 2016 at 15:30, Jürgen Purtz <juergen@purtz.de> wrote:
 
What I have done so far is:
  • Conversion of sgml files to valid xml syntax with a perl skript. I failed to use 'osx' or 'spam'.
  • Conversion of these xml files to Docbook5.x format using xsltproc and Docbooks xslt-migration skripts.
  • Creation of html files using xsltproc and Docbooks xslt skripts.
  • Creation of fo files using xsltproc and Docbooks xslt skripts.
  • Creation of pdf files using fop.
  • The conversions needs less than 10 minutes on a Intel i5 processor.
So you believe you have/can convert between the two formats accurately, so we can change things in a single commit?

What verification is offered? Possible?

And that is ready to go now? Will you post your perl script, or the patch? Other projects use the same file formats, e.g. Slony, XL etc

If an automatic migration is possible do we need to change at all?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Hi,

actually I have done only a first raw round-trip to evaluate that there is no showstopper for my plans. If we find a consensus in the community that this work is valuable for the postgres documentation I will continue to work on it in the near future. To answer your questions:
  • "do we need to change at all?". This question has to be discussed in the community. I tried to use the recommended tools like 'osx' and 'spam' - and failed (not at all but in details like newline processing). This may be a my fault, or it results from the fact that we still use sgml instead of xml. But over time this task will get harder and harder: sgml knowledge gets lost, sgml-tools are no longer actively developed, xml move foreward, ...
  • Actually I don't see any showstopper. Therefore I believe that the conversion from Docbook 4 to 5 is manageable. The plan is that we will have one xml-file in db5 format per every sgml file in db4 format.
  • To support the repository in a continuous way we shall do something like 'git mv file.sgml file.xml', put the new content to 'file.xml' and 'git commit'. Additionally the newlines must be kept during all conversation steps.
  • Maybe some very individual (manual) steps are necessary, but it shall be possible that also this can be scripted. Therefore the conversion shall run fast and a single commit shall work on the complete documentation.
  • There are no special "Postgres" tasks in the Perl script or at any other places. It depends on docbook only. Therefore other projects can use it in the same way. Of course I will publish all sources.
  • Actually I try to generate well-formed xml. Validation against the Docbook 5 schema will follow.


Alexander Law posted additional suggestions and questions:

Hello Jürgen,

Please look at the discussion that we had some time ago:
http://www.postgresql.org/message-id/56337365.2080104@postgrespro.ru

And we (postgrespro) still have plans to migrate to XML as soon as we get documentation translated.
We had no issues with SGML->XML conversion, "make postgres.xml" creates XML (with entities and alike), which we use.

When you talking about "conversion of html, fo, pdf, ..." do you mean using docs/sgml/Makefile or some other scripts?

As to conversion SGML to XML, we need to decide whether to generate a single XML, or a set of XMLs (corresponding to current SGMLs).
In the latter case - how to include XML-fragments into the main document (as entities or with xi:include)?

Please, can you explain what are "Docbooks xslt-migration scripts"?
Is Docbook 4.x incompatible with Docbook 5.x and we need to convert it additionally?


Best regards,
Alexander

-----
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


My answers:

  • Docbook 4 and 5 are not compatible. There are new elements, others have gone and are replaced by more generic ones. But the Docbook project offers xslt's to convert Docbook 4 xml-files to Docbook5 xml-files.
  • There are pros and cons using postgres.xml as a starting point. PRO: well formed (and valid?) xml format. Entities keeps alive. No more "<![CDATA[", "<![%include" and similar sgml constructs. CON: Only one file. Ugly line break algorithm.
  • Actually I don't use the existing Makefile. I start Perl, xsltproc and fop with a different script. If I continue to work, I have to change the Makefile.
  • "how to include XML-fragments into the main document (as entities or with xi:include) ?". As described above, I prefer one file per existing sgml-file. But some of those sgml-files have more than one root element. It such situations (and without further processing) the resulting xml-files will have fragments. In general it will be more "Docbook 5 compliant" to use xi:include instead of entities.
  • "Docbooks xslt-migration scripts": see: http://docbook.org/docs/howto/#convert4to5

Kind regards
Jürgen Purtz


Re: Docbook 5.x

From
Alexander Law
Date:
Hello Jürgen,

Please look at the discussion that we had some time ago:
http://www.postgresql.org/message-id/56337365.2080104@postgrespro.ru

And we (postgrespro) still have plans to migrate to XML as soon as we get documentation translated.
We had no issues with SGML->XML conversion, "make postgres.xml" creates XML (with entities and alike), which we use.

When you talking about "conversion of html, fo, pdf, ..." do you mean using docs/sgml/Makefile or some other scripts?

As to conversion SGML to XML, we need to decide whether to generate a single XML, or a set of XMLs (corresponding to current SGMLs).
In the latter case - how to include XML-fragments into the main document (as entities or with xi:include)?

Please, can you explain what are "Docbooks xslt-migration scripts"?
Is Docbook 4.x incompatible with Docbook 5.x and we need to convert it additionally?


Best regards,
Alexander

-----
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




20.04.2016 17:30, Jürgen Purtz пишет:
Hi,
actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an upgrade to DocBook 5.x. This sounds simple, but it will be a long process with many sub-tasks.

Rationale:
  • Sooner or later we MUST migrate as the 4.x series is outdated: V4.2 dates back to 2002. The 4.x series is no longer actively developed since 2006. See: http://www.docbook.org/tdg5/en/html/ch01.html "In October 2006, the DocBook Technical Committee released DocBook V4.5, the last release planned in the 4.x series."
  • V5.0 is available since 2009. See: http://www.docbook.org/tdg5/en/html/ch01.html: "DocBook V5.0 became an official Committee Specification in June 2009 and became an officia7l OASIS Standard in October 2009."
  • Actually the technical committee has the third Candidate Release for V5.1.

PROs:
  • The formal part of the migration is supported by existing tools: http://docbook.org/docs/howto/#convert4to5 (nevertheless some scripts written by ourself will be necessary).
  • The normative schema for Docbook 5.x is written in RELAX NG. Additionally the technical committee converts this normative schema to a XSD schema and to DTD, which are not normative but very near to RELAX NG and will fit for most applications. Hence, we have the choice between three schema syntaxes and everybody can use his favourite one.
  • Our source file format will switch from SGML to XML. This implies that we have access to all XML features like XLink, XPath, XSLT, XSL-FO, SVG, MathML, namespaces, ... .
CONs:
  • The migration from 4.x to 5.x implies major changes at 3 different levels.
    • DocBook structure: Previously it was defined in SGML syntax (DTD). Now it is defined in RELAX NG schema language plus Schematron rules.
    • DocBook files: Previously we used SGML syntax for our files. We must convert them to a valid XML syntax, eg: tag omission.
    • Tools and style sheets: All tools which operate at the native SGML-level (editors, conversions, ...) must be replaced by XML conforming tools. As valid XML implicitly conforms to a valid SGML syntax this step may be accomplished by reconfiguring some of the tools, eg.: .emacs.
What I have done so far is:
  • Conversion of sgml files to valid xml syntax with a perl skript. I failed to use 'osx' or 'spam'.
  • Conversion of these xml files to Docbook5.x format using xsltproc and Docbooks xslt-migration skripts.
  • Creation of html files using xsltproc and Docbooks xslt skripts.
  • Creation of fo files using xsltproc and Docbooks xslt skripts.
  • Creation of pdf files using fop.
  • The conversions needs less than 10 minutes on a Intel i5 processor.
This is a very first raw round-trip with one output file per sgml file and output type. Not supported: entities (__gt__ as a surrogate), <[CDATA and similar SGML constructs, PostgreSQL specific style sheets, Makefile, additional errors occur, .... .  I append one file of every new format for the chapter "Advanced Features": xml (the new source), html, fo, pdf.

Any ideas or suggestions? Shall we go further on this way? Has anybody more experiences in SGML-->XML conversions or Docbook 4.x --> 5.x conversions?

Kind regards
Jürgen Purtz




Re: Docbook 5.x

From
Jürgen Purtz
Date:
Hello,

the conversion of PostgreSQL documentation from Docbook 4.x to 5.x consists of the following steps:
  1. pure sgml --> xml conversion (done, Perl script)
  2. 4.x markup --> 5.x markup (done, Docbook standard migration script)
  3. post-processing of 5.x files (done, Perl: xi:include, entities, ...)
  4. generate the complete file postgres_all.xml with xmllint (done)
  5. generate online documentation (html, man, text)
  6. generate print documentation (rtf, pdf)
  7. adopt Makefile to the new situation
After step 3 we have well-formed xml files, most of them are valid against Docbook 5.0 xsd. Actually the following non-valid situations occur:
  • a lot of unknown xref targets, as the target exists in a different file
  • 4 remaining sgml-entities: standalone-xxx and include-xxx
  • some markups, which are not valid in 5.x, mostly with <synopsis> and <function>. This must be resolved manually (5.x offers comprehensive possibilities for very detailed markups with <funcsynopsis> and <cmdsynopsis>)
Steps 5 and 6 implies the replacement of our dsssl script with xslt scripts. I guess that this is much more difficult and lengthy than everything else I have done in this project so far. Furthermore I don't have any knowledge about dsssl. And this is the reason why I write this mail. Is anybody out there who can support me for the dsssl --> xslt conversion - or at least can answer questions like:
  • Is our file stylesheet.dsl written from scratch - or is it derived from any docbook 1/2/3/4.x generic stylesheet?
  • Which person has developed this file?
  • What is the role of the *.xsl files in the sgml-directory and how do they collaborate with stylesheet.dsl?
Regards, Jürgen



On 20.04.2016 20:41, Simon Riggs wrote:
On 20 April 2016 at 15:30, Jürgen Purtz <juergen@purtz.de> wrote:
 
What I have done so far is:
  • Conversion of sgml files to valid xml syntax with a perl skript. I failed to use 'osx' or 'spam'.
  • Conversion of these xml files to Docbook5.x format using xsltproc and Docbooks xslt-migration skripts.
  • Creation of html files using xsltproc and Docbooks xslt skripts.
  • Creation of fo files using xsltproc and Docbooks xslt skripts.
  • Creation of pdf files using fop.
  • The conversions needs less than 10 minutes on a Intel i5 processor.
So you believe you have/can convert between the two formats accurately, so we can change things in a single commit?

What verification is offered? Possible?

And that is ready to go now? Will you post your perl script, or the patch? Other projects use the same file formats, e.g. Slony, XL etc

If an automatic migration is possible do we need to change at all?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Docbook 5.x

From
Alvaro Herrera
Date:
Jürgen Purtz wrote:
> Hi,
> actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an
> upgrade to DocBook 5.x. This sounds simple, but it will be a long process
> with many sub-tasks.

Yes, agreed.  The killer objection placed last time was that it took
something like 10x longer to generate the HTML using the XML-based
toolchain than the SGML-based ones.  If this is not fixed, let's forget
about this whole thing until it is.  So, would you time the process
using both toolchains and report back?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
"Joshua D. Drake"
Date:
On 05/03/2016 12:34 PM, Alvaro Herrera wrote:
> Jürgen Purtz wrote:
>> Hi,
>> actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an
>> upgrade to DocBook 5.x. This sounds simple, but it will be a long process
>> with many sub-tasks.
>
> Yes, agreed.  The killer objection placed last time was that it took
> something like 10x longer to generate the HTML using the XML-based
> toolchain than the SGML-based ones.  If this is not fixed, let's forget
> about this whole thing until it is.  So, would you time the process
> using both toolchains and report back?

IIRC:

TGL submitted a patch for the openjade bug way back when that caused
that issue.

TGL, do you know what happened there?

Sincerely,

JD




--
Command Prompt, Inc.                  http://the.postgres.company/
                         +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


Re: Docbook 5.x

From
Oleg Bartunov
Date:


On Tue, May 3, 2016 at 10:34 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Jürgen Purtz wrote:
> Hi,
> actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an
> upgrade to DocBook 5.x. This sounds simple, but it will be a long process
> with many sub-tasks.

Yes, agreed.  The killer objection placed last time was that it took
something like 10x longer to generate the HTML using the XML-based
toolchain than the SGML-based ones.  If this is not fixed, let's forget
about this whole thing until it is.  So, would you time the process
using both toolchains and report back?
the xml performance may be greatly improved. Alexander, what is current state of art of your patch ? How slow is xml in compare to sgml ?
 


--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Re: Docbook 5.x

From
Tom Lane
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:
> On 05/03/2016 12:34 PM, Alvaro Herrera wrote:
>> Yes, agreed.  The killer objection placed last time was that it took
>> something like 10x longer to generate the HTML using the XML-based
>> toolchain than the SGML-based ones.  If this is not fixed, let's forget
>> about this whole thing until it is.  So, would you time the process
>> using both toolchains and report back?

> IIRC:
> TGL submitted a patch for the openjade bug way back when that caused
> that issue.

I think you're thinking of this:
http://www.postgresql.org/message-id/24388.1166800682@sss.pgh.pa.us

I do not recall just when/how that got resolved upstream, or if they
ever even responded to me.  But it must have been resolved, because the
performance before that was patched was untenable even then, and would be
far more so now considering how much our docs have grown since 2006.
I have not heard anyone complaining lately that PDF output takes three
days to build.

In short, I doubt that that's relevant anymore.  If it was, it would
certainly not be favorable to the XML toolchain.

BTW, the thread that that message is embedded in is pretty relevant,
because it was all about yet another lets-switch-to-XML proposal...

            regards, tom lane


Re: Docbook 5.x

From
Tom Lane
Date:
I wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
>> IIRC:
>> TGL submitted a patch for the openjade bug way back when that caused
>> that issue.

> I think you're thinking of this:
> http://www.postgresql.org/message-id/24388.1166800682@sss.pgh.pa.us

> I do not recall just when/how that got resolved upstream, or if they
> ever even responded to me.  But it must have been resolved, because the
> performance before that was patched was untenable even then, and would be
> far more so now considering how much our docs have grown since 2006.

Actually, further digging suggests that Peter found a way to hack our
stylesheets to avoid that openjade bug:

http://www.postgresql.org/message-id/200612100315.47269.peter_e@gmx.net
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=465269b8a

So it's possible that the openjade bug is still there, but has been
defanged for our purposes.  In any case, there's still little reason
to think that it would apply to a different toolchain.

            regards, tom lane


Re: Docbook 5.x

From
Peter Eisentraut
Date:
On 5/3/16 4:13 PM, Oleg Bartunov wrote:
> As it stated in
> http://www.postgresql.org/message-id/562E061B.1090809@postgrespro.ru
> the xml performance may be greatly improved. Alexander, what is current
> state of art of your patch ? How slow is xml in compare to sgml ?

Please make sure the patch is registered in the next commit fest.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Jürgen Purtz
Date:
Hello,

I measured following elapsed times on an Intel i5 processor:
  1. generate all HTML files with dsl script (make html): 0:48 min.
  2. generate all HTML files with xslt script (make xslthtml): 16:01 min.
  3. generate all HTML files with xslt script in the new environment (pure Docbook5): 4:07 min.
  4. Generating different things via dsl scripts in the new environment may be possible. But the changelog of the Docbook5 dsl scripts shows, that the last modification occurred in 2004 - this way is a dead end.
There is one principle and a lot of minor differences between 2 and 3. Solution 2 is based on an xml-file and xslt scripts which are based on Docbook4. The basic difference to 3 is, that in 3 everything is Docbook5 compliant: there are only Docbook5 xml- and xslt-files (as my workflow is: db4 --> xml --> db5 -- (db5 xslt) --> html). The minor differences concerns the fact, that actually there are errors in my xml files and that I made only a few parameterisation to the Docbook5 standard xslt files - no optimization at all.

I used following tools: perl, xmllint and xsltproc. osx and OpenJade are obsolete in the new environment (so far, there is much more work to do).

Jürgen Purtz



On 03.05.2016 22:13, Oleg Bartunov wrote:


On Tue, May 3, 2016 at 10:34 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Jürgen Purtz wrote:
> Hi,
> actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an
> upgrade to DocBook 5.x. This sounds simple, but it will be a long process
> with many sub-tasks.

Yes, agreed.  The killer objection placed last time was that it took
something like 10x longer to generate the HTML using the XML-based
toolchain than the SGML-based ones.  If this is not fixed, let's forget
about this whole thing until it is.  So, would you time the process
using both toolchains and report back?
the xml performance may be greatly improved. Alexander, what is current state of art of your patch ? How slow is xml in compare to sgml ?
 


--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs


Re: Docbook 5.x

From
Tom Lane
Date:
=?UTF-8?Q?J=c3=bcrgen_Purtz?= <juergen@purtz.de> writes:
> I measured following elapsed times on an Intel i5 processor:

>  1. generate all HTML files with dsl script (make html): 0:48 min.
>  2. generate all HTML files with xslt script (make xslthtml): 16:01 min.
>  3. generate all HTML files with xslt script in the new environment
>     (pure Docbook5): 4:07 min.
>  4. Generating different things via dsl scripts in the new environment
>     may be possible. But the changelog of the Docbook5 dsl scripts
>     shows, that the last modification occurred in 2004 - this way is a
>     dead end.

Ouch.  What about output to PDF?  While we don't care as much about
that as HTML for day-to-day use, it has to be feasible (ie, not hours).

            regards, tom lane


Re: Docbook 5.x

From
Alexander Law
Date:
Hello Jürgen,

As was stated in the aforementioned thread, solution 2 can be much (8x) faster with some xslt optimizations, but I think now we should outline some roadmap before we start to prepare patches and so.
Maybe we should convert to XML with DocBook4 at first step?
Then, once we get everything stabilized, we can upgrade to DocBook5.
Shouldn't we decompose the conversion procedure, so we could perform fully automatic conversion without any manual changes, and then fix non-valid situations, you described before?

And one more question - Is conversion to DocBook5 your final goal? Or maybe you have any further plans regarding documentation, such as translating it to Deutsch?

Best regards,
Alexander


04.05.2016 17:44, Jürgen Purtz пишет:
Hello,

I measured following elapsed times on an Intel i5 processor:
  1. generate all HTML files with dsl script (make html): 0:48 min.
  2. generate all HTML files with xslt script (make xslthtml): 16:01 min.
  3. generate all HTML files with xslt script in the new environment (pure Docbook5): 4:07 min.
  4. Generating different things via dsl scripts in the new environment may be possible. But the changelog of the Docbook5 dsl scripts shows, that the last modification occurred in 2004 - this way is a dead end.
There is one principle and a lot of minor differences between 2 and 3. Solution 2 is based on an xml-file and xslt scripts which are based on Docbook4. The basic difference to 3 is, that in 3 everything is Docbook5 compliant: there are only Docbook5 xml- and xslt-files (as my workflow is: db4 --> xml --> db5 -- (db5 xslt) --> html). The minor differences concerns the fact, that actually there are errors in my xml files and that I made only a few parameterisation to the Docbook5 standard xslt files - no optimization at all.

I used following tools: perl, xmllint and xsltproc. osx and OpenJade are obsolete in the new environment (so far, there is much more work to do).

Jürgen Purtz



On 03.05.2016 22:13, Oleg Bartunov wrote:


On Tue, May 3, 2016 at 10:34 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Jürgen Purtz wrote:
> Hi,
> actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an
> upgrade to DocBook 5.x. This sounds simple, but it will be a long process
> with many sub-tasks.

Yes, agreed.  The killer objection placed last time was that it took
something like 10x longer to generate the HTML using the XML-based
toolchain than the SGML-based ones.  If this is not fixed, let's forget
about this whole thing until it is.  So, would you time the process
using both toolchains and report back?
the xml performance may be greatly improved. Alexander, what is current state of art of your patch ? How slow is xml in compare to sgml ?
 


--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs



Re: Docbook 5.x

From
Alvaro Herrera
Date:
Jürgen Purtz wrote:

> I measured following elapsed times on an Intel i5 processor:
>
> 1. generate all HTML files with dsl script (make html): 0:48 min.
> 2. generate all HTML files with xslt script (make xslthtml): 16:01 min.
> 3. generate all HTML files with xslt script in the new environment
>    (pure Docbook5): 4:07 min.
> 4. Generating different things via dsl scripts in the new environment
>    may be possible. But the changelog of the Docbook5 dsl scripts
>    shows, that the last modification occurred in 2004 - this way is a
>    dead end.

Thanks.

The dsl toolchain has a "make html" format which creates the index and a
"make draft" that doesn't.  You timed the former only.  What's the
timing for an equivalent of "make draft" in the xslt chain?  If it
exists and is short enough, it seems acceptable to me that the complete
(with index) build takes ~4x as long as today; the draft timing is more
critical, I would think.

Man pages are already generated using xslt, so I suppose that wouldn't
change.  PDF creation timing is also critical.

FWIW, in my laptop "make draft" takes 1m18.788s and a "make html"
takes 1m26.676s.  So it's just 8 seconds to generate the SGML file for
the index, and no reruns required ... hmm.  I think I'm gonna forget
about "make draft" in the future.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Alvaro Herrera
Date:
Alexander Law wrote:
> Hello Jürgen,
>
> As was stated in the aforementioned thread, solution 2 can be much (8x)
> faster with some xslt optimizations, but I think now we should outline some
> roadmap before we start to prepare patches and so.

Can the Docbook5 build be sped up with similar hacks?

If the stylesheet tweaks you did are universally useful, why not
contribute them back to upstream Docbook?

> Maybe we should convert to XML with DocBook4 at first step?
> Then, once we get everything stabilized, we can upgrade to DocBook5.

Not sure there's much point in having an intermediate step in the
repository that makes the doc build so much slower.  I'd rather go to
Docbook5 straight away.

> Shouldn't we decompose the conversion procedure, so we could perform fully
> automatic conversion without any manual changes, and then fix non-valid
> situations, you described before?

I don't think so -- this means leaving a state in the repo in which the
docs don't actually build.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Jürgen Purtz
Date:
On 04.05.2016 16:51, Tom Lane wrote:
> Ouch.  What about output to PDF?  While we don't care as much about
> that as HTML for day-to-day use, it has to be feasible (ie, not hours).
>
>             regards, tom lane

Actually I made tests using fop on single files (the converted sgml
files). This works within seconds and in my very first mail from
2016-04-20 I added the results for the 'advanced.xml' file. When I try
to convert the complete 'postgres_all.xml' file, fop crashes after some
minutes. As fop is a Java application, it is possible that the assigned
main memory is short (-Xms -Xmx, ...) - or it comes from some other Java
specific issues. I will work on this in the next days.

Jürgen Purtz



Re: Docbook 5.x

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> The dsl toolchain has a "make html" format which creates the index and a
> "make draft" that doesn't.  You timed the former only.  What's the
> timing for an equivalent of "make draft" in the xslt chain?  If it
> exists and is short enough, it seems acceptable to me that the complete
> (with index) build takes ~4x as long as today; the draft timing is more
> critical, I would think.

I would object to that; I don't ever use "make draft", in part because
I frequently want to look at whether the index entries look sensible.
Also, as you noted, the time savings is pretty minimal at present.

            regards, tom lane


Re: Docbook 5.x

From
Alexander Law
Date:
Hello Alvaro,
04.05.2016 18:21, Alvaro Herrera wrote:
> Alexander Law wrote:
>> Hello Jürgen,
>>
>> As was stated in the aforementioned thread, solution 2 can be much (8x)
>> faster with some xslt optimizations, but I think now we should outline some
>> roadmap before we start to prepare patches and so.
> Can the Docbook5 build be sped up with similar hacks?
>
> If the stylesheet tweaks you did are universally useful, why not
> contribute them back to upstream Docbook?
I can't guarantee that these tweaks with work for all the DocBook
documents, though I've made sure that the result is the same for the
postgresql doc html's (as I stated in
http://www.postgresql.org/message-id/562E061B.1090809@postgrespro.ru).

>
>> Maybe we should convert to XML with DocBook4 at first step?
>> Then, once we get everything stabilized, we can upgrade to DocBook5.
> Not sure there's much point in having an intermediate step in the
> repository that makes the doc build so much slower.  I'd rather go to
> Docbook5 straight away.
>
>> Shouldn't we decompose the conversion procedure, so we could perform fully
>> automatic conversion without any manual changes, and then fix non-valid
>> situations, you described before?
> I don't think so -- this means leaving a state in the repo in which the
> docs don't actually build.
>
I mean we could build the docs just as we do it now (as DocBook4). So we
can continue to use existing toolchain (and Makefile as it can generate
html and pdf from XML), just change a format for now.



Re: Docbook 5.x

From
Jürgen Purtz
Date:
On 04.05.2016 17:08, Alexander Law wrote:
> As was stated in the aforementioned thread, solution 2 can be much
> (8x) faster with some xslt optimizations, but I think now we should
> outline some roadmap before we start to prepare patches and so.
> Maybe we should convert to XML with DocBook4 at first step?
> Then, once we get everything stabilized, we can upgrade to DocBook5.
> Shouldn't we decompose the conversion procedure, so we could perform
> fully automatic conversion without any manual changes, and then fix
> non-valid situations, you described before?

Hello Alexander,

I havn't seen your xslt optimization so far. What have you done? Where
can I find the optimized script or a description?

"Divide and conquer" is a good strategy and people use it in many cases.
As you have stated, there are two major steps: from db4-sgml to db4-xml
and from there to db5-xml. In parallel to the second one we shall
migrate from dsl scripts to db5-xslt scripts. Your idea to go step by
step and stabilise at the intermediate level is good in general. But in
this case it may be unnecessary. The first step is very small. It
consists mainly of the elimination of shorttags and empty elements. This
is a pure formal act without risk. If we would stop at this point,
people are forced to switch their environment, eg .emacs from db4-sgml
to db4-xml - and after the second step to db5-xml. This is possible -
but the twice changing will bring (possibly) more confusion than
advantages. The real challenge is the second step as it implies some
manual modifications (entities, non-valid markup in sense of db5-schema)
and a switch to a different output chain. Maybe we can live for a while
with some files, which are not valid against db5-schema - as far as the
output chain produces correct results.

Jürgen Purtz




Re: Docbook 5.x

From
Alvaro Herrera
Date:
Jürgen Purtz wrote:

> The real challenge is the second
> step as it implies some manual modifications (entities, non-valid markup in
> sense of db5-schema) and a switch to a different output chain. Maybe we can
> live for a while with some files, which are not valid against db5-schema -
> as far as the output chain produces correct results.

Speaking of entities, I noticed you changed some entities such as
— to __mdash__ and such.  That's not acceptable.  What is
Docbook5's accepted way to enter such characters?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Alexander Law
Date:
04.05.2016 19:52, Jürgen Purtz пишет:
> On 04.05.2016 17:08, Alexander Law wrote:
>> As was stated in the aforementioned thread, solution 2 can be much
>> (8x) faster with some xslt optimizations, but I think now we should
>> outline some roadmap before we start to prepare patches and so.
>> Maybe we should convert to XML with DocBook4 at first step?
>> Then, once we get everything stabilized, we can upgrade to DocBook5.
>> Shouldn't we decompose the conversion procedure, so we could perform
>> fully automatic conversion without any manual changes, and then fix
>> non-valid situations, you described before?
>
> Hello Alexander,
>
> I havn't seen your xslt optimization so far. What have you done? Where
> can I find the optimized script or a description?
It was attached to the message:
http://www.postgresql.org/message-id/562E061B.1090809@postgrespro.ru
> "Divide and conquer" is a good strategy and people use it in many
> cases. As you have stated, there are two major steps: from db4-sgml to
> db4-xml and from there to db5-xml. In parallel to the second one we
> shall migrate from dsl scripts to db5-xslt scripts. Your idea to go
> step by step and stabilise at the intermediate level is good in
> general. But in this case it may be unnecessary. The first step is
> very small. It consists mainly of the elimination of shorttags and
> empty elements. This is a pure formal act without risk. If we would
> stop at this point, people are forced to switch their environment, eg
> .emacs from db4-sgml to db4-xml - and after the second step to
> db5-xml. This is possible - but the twice changing will bring
> (possibly) more confusion than advantages. The real challenge is the
> second step as it implies some manual modifications (entities,
> non-valid markup in sense of db5-schema) and a switch to a different
> output chain. Maybe we can live for a while with some files, which are
> not valid against db5-schema - as far as the output chain produces
> correct results.
By first step I mean SGML->XML conversion (while staying with DocBook4).
It's not small, IMHO, as it involves all the doc/sgml contents replacing
(with doc/xml or alike) and updating makefile (replacing 'html' target
with 'xslthtml' and so on).
Though it can (and should) be performed as one commit with using one
conversion script.
The advantage of "baby steps" for me is an opportunity to play safe.
In fact it's the question of balance between amount of redundant work
and manageability of the conversion.
IMO, amount of redundant work is not so large as:
1) we should convert and sgml->xml anyway
2) we can use existing makefile's targets
3) we should update documentation on documentation ([1], [2]) anyway.

[1] http://www.postgresql.org/docs/9.5/interactive/docguide-build.html
[2] http://www.postgresql.org/docs/9.5/interactive/docguide-authoring.html

Alexander Lakhin


Re: Docbook 5.x

From
Jürgen Purtz
Date:
Hello Alvaro,

yes, character entities respectively their values must be kept (what you
have seen is an intermediate state). We will use utf-8, so every
possible Unicode code point can be used directly. But we use not only
character entities, there are also parameter entities and external
entities. The external entities will be replaced by xi:XInclude. At last
there are 4 parameter entities, to whom I actually have no solution:
%standalone-ignore; %standalone-include; %include-index;
%include-xslt-index; . But they should not be a show-stopper.

Jürgen Purtz


On 04.05.2016 19:12, Alvaro Herrera wrote:
> Jürgen Purtz wrote:
>
>> The real challenge is the second
>> step as it implies some manual modifications (entities, non-valid markup in
>> sense of db5-schema) and a switch to a different output chain. Maybe we can
>> live for a while with some files, which are not valid against db5-schema -
>> as far as the output chain produces correct results.
> Speaking of entities, I noticed you changed some entities such as
> — to __mdash__ and such.  That's not acceptable.  What is
> Docbook5's accepted way to enter such characters?
>



Re: Docbook 5.x

From
Jürgen Purtz
Date:
Alvaro,

the advantage of draft mode is smaller than 15% in the three
environments. In Docbook5 it reduces the elapsed time from 4:07 to 4:02.

Jürgen Purtz



On 04.05.2016 17:18, Alvaro Herrera wrote:
> The dsl toolchain has a "make html" format which creates the index and a
> "make draft" that doesn't.  You timed the former only.  What's the
> timing for an equivalent of "make draft" in the xslt chain?  If it
> exists and is short enough, it seems acceptable to me that the complete
> (with index) build takes ~4x as long as today; the draft timing is more
> critical, I would think.



Re: Docbook 5.x

From
Alvaro Herrera
Date:
Jürgen Purtz wrote:
> Hello Alvaro,
>
> yes, character entities respectively their values must be kept (what you
> have seen is an intermediate state). We will use utf-8, so every possible
> Unicode code point can be used directly. But we use not only character
> entities, there are also parameter entities and external entities. The
> external entities will be replaced by xi:XInclude.

OK.  Currently we have no non-ASCII chars in our source code, so this
would be without precedent, but I think all modern tools should cope.
The only pain point may be Tom Lane's mail client, which is unique in
still using us-ascii encoding.

> At last there are 4 parameter entities, to whom I actually have no
> solution: %standalone-ignore; %standalone-include; %include-index;
> %include-xslt-index; . But they should not be a show-stopper.

Hmm?  I think we use these entities to generate text files that are
distributed in the tarball.  How would we generate these files in the
Docbook5 XML world?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Peter Eisentraut
Date:
On 5/4/16 11:08 AM, Alexander Law wrote:
> As was stated in the aforementioned thread, solution 2 can be much (8x)
> faster with some xslt optimizations, but I think now we should outline
> some roadmap before we start to prepare patches and so.
> Maybe we should convert to XML with DocBook4 at first step?
> Then, once we get everything stabilized, we can upgrade to DocBook5.
> Shouldn't we decompose the conversion procedure, so we could perform
> fully automatic conversion without any manual changes, and then fix
> non-valid situations, you described before?

I think the process should be something like this:

- Apply your XSLT performance patch.  The patch should be submitted to
the next commit fest.

- Wait a while to make sure everyone is happy with the performance.
Keep tweaking if necessary.

- Port all DSSSL customizations to XSLT.  Manually evaluate output for
quality.

- Switch to XSLT build for official HTML documentation. [milestone 1]

- Convert sources to XML. (There could be substeps here.) [milestone 2]

- Then consider upgrading to DocBook 5. [milestone 3]

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Alexander Law
Date:
Done (previous patch cleaned).
This patch optimizes XSL transformations contained in docbook-xsl (1.78.1).

Tested with 9.5.2
time make html
real    1m21.989s
user    1m21.392s
sys     0m0.484s

1) time make xslthtml
(before the patch)
real    29m19.904s
user    29m18.804s
sys     0m0.888s

2) time make xslthtml
(after the patch)
real    3m8.483s
user    3m7.556s
sys     0m0.864s

To make sure that the result of the transformation is the same:
After 1):
mv html html.xslt1
After 2):
mv html html.xslt2
for f in *.xslt*/*.html; do sed -e
's/id=\"\(ftn\.\)\?id[a-z][0-9]\+\"/id=\"id\"/g' -i $f ; sed -e
's/href=\"[^#]*#\(ftn\.\)\?id[a-z][0-9]\+\"/href=\"#\"/g' -i $f; done
diff -u -r html.xslt1 html.xslt2

Best regards,
Alexander


04.05.2016 03:44, Peter Eisentraut пишет:
> On 5/3/16 4:13 PM, Oleg Bartunov wrote:
>> As it stated in
>> http://www.postgresql.org/message-id/562E061B.1090809@postgrespro.ru
>> the xml performance may be greatly improved. Alexander, what is current
>> state of art of your patch ? How slow is xml in compare to sgml ?
>
> Please make sure the patch is registered in the next commit fest.
>


Attachment

Re: Docbook 5.x

From
Alexander Law
Date:
Hello Peter,
I fully support your plan  and going to move this way.

Best regards,
Alexander

05.05.2016 04:09, Peter Eisentraut пишет:
> On 5/4/16 11:08 AM, Alexander Law wrote:
>> As was stated in the aforementioned thread, solution 2 can be much (8x)
>> faster with some xslt optimizations, but I think now we should outline
>> some roadmap before we start to prepare patches and so.
>> Maybe we should convert to XML with DocBook4 at first step?
>> Then, once we get everything stabilized, we can upgrade to DocBook5.
>> Shouldn't we decompose the conversion procedure, so we could perform
>> fully automatic conversion without any manual changes, and then fix
>> non-valid situations, you described before?
>
> I think the process should be something like this:
>
> - Apply your XSLT performance patch.  The patch should be submitted to
> the next commit fest.
>
> - Wait a while to make sure everyone is happy with the performance.
> Keep tweaking if necessary.
>
> - Port all DSSSL customizations to XSLT.  Manually evaluate output for
> quality.
>
> - Switch to XSLT build for official HTML documentation. [milestone 1]
>
> - Convert sources to XML. (There could be substeps here.) [milestone 2]
>
> - Then consider upgrading to DocBook 5. [milestone 3]
>



Re: Docbook 5.x

From
Jürgen Purtz
Date:
Hi,

after some manual interventions (which must become part of an algorithm)
it's possible to created the complete PDF file for the db4 production
chain: *.sgml -- (perl) --> *.xml and after that step: postgres.xml --
(evaluate XInclude with xmllint) --> postgres_all.xml -- (xsltproc using
standard db4 stylesheet) --> postgres_all.fo -- (fop) -->
postgres_all.pdf. The fo/pdf generation takes about 1 minute.

Actually the layout of the resulting pdf file differs from the original
one as I used only standard scripts without any adoption to the
PostgreSQL styles. Additionally I'm missing some of the links.

Jürgen Purtz


On 04.05.2016 17:30, Jürgen Purtz wrote:
> On 04.05.2016 16:51, Tom Lane wrote:
>> Ouch.  What about output to PDF?  While we don't care as much about
>> that as HTML for day-to-day use, it has to be feasible (ie, not hours).
>>
>>             regards, tom lane
>
> Actually I made tests using fop on single files (the converted sgml
> files). This works within seconds and in my very first mail from
> 2016-04-20 I added the results for the 'advanced.xml' file. When I try
> to convert the complete 'postgres_all.xml' file, fop crashes after
> some minutes. As fop is a Java application, it is possible that the
> assigned main memory is short (-Xms -Xmx, ...) - or it comes from some
> other Java specific issues. I will work on this in the next days.
>
> Jürgen Purtz
>



Re: Docbook 5.x

From
Alvaro Herrera
Date:
Jürgen Purtz wrote:

> after some manual interventions (which must become part of an algorithm)
> it's possible to created the complete PDF file for the db4 production chain:
> *.sgml -- (perl) --> *.xml and after that step: postgres.xml -- (evaluate
> XInclude with xmllint) --> postgres_all.xml -- (xsltproc using standard db4
> stylesheet) --> postgres_all.fo -- (fop) --> postgres_all.pdf. The fo/pdf
> generation takes about 1 minute.

Uh?  So generating the PDF is much quicker than generating the HTML?
That's strange, but if true, then I'm happy about it.

> Actually the layout of the resulting pdf file differs from the original one
> as I used only standard scripts without any adoption to the PostgreSQL
> styles.

Surely this can be sorted out.

> Additionally I'm missing some of the links.

Not sure what you mean here.

Did you figure a way to generate the INSTALL file?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Jürgen Purtz
Date:
Is everybody happy with the performance patch?
Is anybody working on the DSSSL --> XSLT conversion? We should avoid
parallel work.

Jürgen Purtz


On 05.05.2016 03:09, Peter Eisentraut wrote:
> On 5/4/16 11:08 AM, Alexander Law wrote:
>> As was stated in the aforementioned thread, solution 2 can be much (8x)
>> faster with some xslt optimizations, but I think now we should outline
>> some roadmap before we start to prepare patches and so.
>> Maybe we should convert to XML with DocBook4 at first step?
>> Then, once we get everything stabilized, we can upgrade to DocBook5.
>> Shouldn't we decompose the conversion procedure, so we could perform
>> fully automatic conversion without any manual changes, and then fix
>> non-valid situations, you described before?
>
> I think the process should be something like this:
>
> - Apply your XSLT performance patch.  The patch should be submitted to
> the next commit fest.
>
> - Wait a while to make sure everyone is happy with the performance.
> Keep tweaking if necessary.
>
> - Port all DSSSL customizations to XSLT.  Manually evaluate output for
> quality.
>
> - Switch to XSLT build for official HTML documentation. [milestone 1]
>
> - Convert sources to XML. (There could be substeps here.) [milestone 2]
>
> - Then consider upgrading to DocBook 5. [milestone 3]
>



Re: Docbook 5.x

From
Peter Eisentraut
Date:
On 5/13/16 3:38 AM, Jürgen Purtz wrote:
> Is everybody happy with the performance patch?
> Is anybody working on the DSSSL --> XSLT conversion? We should avoid
> parallel work.

The performance patch is in the next commit fest for review.

The DSSSL -> XSLT conversion basically just needs someone to go through
the existing code and output and identify necessary improvements.

Most people right now are working on getting the current release out, so
I don't expect much to happen here for a while.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Alexander Law
Date:
Hello Jürgen,

We started to work on the conversion:
http://www.postgresql.org/message-id/56337365.2080104@postgrespro.ru
And now we (at PostgresPro) are going to continue this work.
I think we should do it together to avoid redundant work.

Best regards,
Alexander

13.05.2016 21:01, Peter Eisentraut пишет:
> On 5/13/16 3:38 AM, Jürgen Purtz wrote:
>> Is everybody happy with the performance patch?
>> Is anybody working on the DSSSL --> XSLT conversion? We should avoid
>> parallel work.
>
> The performance patch is in the next commit fest for review.
>
> The DSSSL -> XSLT conversion basically just needs someone to go
> through the existing code and output and identify necessary improvements.
>
> Most people right now are working on getting the current release out,
> so I don't expect much to happen here for a while.
>



Re: Docbook 5.x

From
Jürgen Purtz
Date:
We maintain two css files: docs.css (for online, complex) and
stylesheet.css (local, simple). Why do we need stylesheet.css? Because
of the references to some png-files in docs.css? Because of Lynx? For of
easier comparisons?

Kind regards, Jürgen Purtz


On 13.05.2016 20:01, Peter Eisentraut wrote:
> On 5/13/16 3:38 AM, Jürgen Purtz wrote:
>> Is everybody happy with the performance patch?
>> Is anybody working on the DSSSL --> XSLT conversion? We should avoid
>> parallel work.
>
> The performance patch is in the next commit fest for review.
>
> The DSSSL -> XSLT conversion basically just needs someone to go
> through the existing code and output and identify necessary improvements.
>
> Most people right now are working on getting the current release out,
> so I don't expect much to happen here for a while.
>



Re: Docbook 5.x

From
Peter Eisentraut
Date:
On 5/22/16 3:48 PM, Jürgen Purtz wrote:
> We maintain two css files: docs.css (for online, complex) and
> stylesheet.css (local, simple). Why do we need stylesheet.css?

So you can read the documentation locally without referencing online files.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Peter Eisentraut
Date:
On 5/5/16 12:45 AM, Alexander Law wrote:
> Done (previous patch cleaned).
> This patch optimizes XSL transformations contained in docbook-xsl (1.78.1).

I have looked through this patch, and it's awesome.  I have tweaked it a
bit more along the lines you guys have started, and now the build time
is pretty much the same as with DSSSL.  Attached is my final patch,
which I plan to commit as soon as the new branch opens.

(I only have Alexander Lakhin as credit right now.  Please let me know
if anyone else contributed.)

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Docbook 5.x

From
Alexander Law
Date:
Hello Peter,

Thanks for improvements!
I checked outputs (with the attached script) and found that there is
small difference with the improved patch.
For example, look at xtypes.html:

- <link rel="home" href="index.html" title="PostgreSQL 9.6beta1
Documentation" /><link rel="up" href="extend.html"
title="Chapter 35. Extending SQL" /><link rel="prev" href="xaggr.html"
title="35.10. User-defined Aggregates" /><link rel="next"
href="xoper.html" title="35.12. User-defined Operators" /><link
rel="copyright" href="legalnotice.html" title="Legal Notice" /></head>

+ <link rel="prev" href="xaggr.html" title="35.10. User-defined
Aggregates" /><link rel="next" href="xoper.html"
title="35.12. User-defined Operators" /></head>

It caused by <xsl:template name="html.head">.
Leaving aside the question whether the links "home", "up" and
"copyright" are needed, maybe it's better to split the commit to two?
First to speed up the conversion while making sure that the output is
the same, and the second to change the html.head output format.

Best regards,
Alexander


03.06.2016 22:31, Peter Eisentraut пишет:
> On 5/5/16 12:45 AM, Alexander Law wrote:
>> Done (previous patch cleaned).
>> This patch optimizes XSL transformations contained in docbook-xsl
>> (1.78.1).
>
> I have looked through this patch, and it's awesome.  I have tweaked it
> a bit more along the lines you guys have started, and now the build
> time is pretty much the same as with DSSSL. Attached is my final
> patch, which I plan to commit as soon as the new branch opens.
>
> (I only have Alexander Lakhin as credit right now.  Please let me know
> if anyone else contributed.)
>


Attachment

Re: Docbook 5.x

From
Peter Eisentraut
Date:
On 6/4/16 10:28 AM, Alexander Law wrote:
> It caused by <xsl:template name="html.head">.
> Leaving aside the question whether the links "home", "up" and
> "copyright" are needed, maybe it's better to split the commit to two?
> First to speed up the conversion while making sure that the output is
> the same, and the second to change the html.head output format.

I did that intentionally, but I agree that it might be better to split
this off into a separate commit.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Alexander Law
Date:
Hello Peter,

In that case can we postpone the second commit until step 3 in your plan
of migration to XML is done?
I mean "- Port all DSSSL customizations to XSLT.  Manually evaluate
output for quality. ";
If we will not change contents/formatting until migration to xslt is
done, we can ensure that output is the same by automatic means.
(See my letter:
https://www.postgresql.org/message-id/56337365.2080104%40postgrespro.ru)

Best regards,
Alexander



06.06.2016 15:28, Peter Eisentraut пишет:
> On 6/4/16 10:28 AM, Alexander Law wrote:
>> It caused by <xsl:template name="html.head">.
>> Leaving aside the question whether the links "home", "up" and
>> "copyright" are needed, maybe it's better to split the commit to two?
>> First to speed up the conversion while making sure that the output is
>> the same, and the second to change the html.head output format.
>
> I did that intentionally, but I agree that it might be better to split
> this off into a separate commit.
>



Re: Docbook 5.x

From
Peter Eisentraut
Date:
On 6/6/16 1:49 PM, Alexander Law wrote:
> In that case can we postpone the second commit until step 3 in your plan
> of migration to XML is done?
> I mean "- Port all DSSSL customizations to XSLT.  Manually evaluate
> output for quality. ";

It is part of the performance work.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Jürgen Purtz
Date:

On 05.05.2016 03:09, Peter Eisentraut wrote:
I think the process should be something like this:

- Apply your XSLT performance patch.  The patch should be submitted to the next commit fest.

- Wait a while to make sure everyone is happy with the performance. Keep tweaking if necessary.

- Port all DSSSL customizations to XSLT.  Manually evaluate output for quality.

- Switch to XSLT build for official HTML documentation. [milestone 1]

- Convert sources to XML. (There could be substeps here.) [milestone 2]

- Then consider upgrading to DocBook 5. [milestone 3]

Alexander and I continue to work on this path. In the meanwhile we have reached a state where xml files are well formed and valid against docbook 4 dtd - each single file as well as the big postgres_all.xml file. Thanks to Alexander's performance patch all XSLT processes run very fast (the slowest is fo+pdf with 6:30 min).

On this basis I actually work on the HTML generation. But in opposite to the previous steps (where we create identical copies of the sgml files) the new css file is very different from the old one. This results from the following:
  • The XSLT process generates other HTML elements and other classes in comparison to the dsssl process.
  • XML files are case sensitive. All object names (id, ulink, linkend, zone, ...) are now lower case.
  • Sometimes the order of elements changed.
  • As the previous css file was constructed (some years ago) from three different css files, he contains redundant and sometimes contradictory information. I did a complete review.
To get a feedback from the community I have published the resulting postgres_all.html and its pgdoc_online.css file. Please refer to https://github.com/JuergenPurtz/pgdoc_db5/blob/master/postgresql-9.5.3/doc/src/db4_xml/postgres_all.html respective pgdoc_online.css to get the files. Please compare the html file with pages you are familiar with. And remember: the look-and-feel is similar, but far from identical.

Jürgen Purtz

Re: Docbook 5.x

From
Alexander Law
Date:
Hello,

I have some progress with the "Port all DSSSL customizations to XSLT" step. I still think that we should avoid manual comparison of old and new outputs when we can align them and have all differences observable, countable and manageable.
I had developed XSLT's that allows us to do it. Please look at the patch (with all the XSLT's)  and the comparison script attached.
(There are a several dozens of changes in XSLTs and some in the script. Some differences still remain but they are observable and can be eliminated too)

Btw, can I hope to get some feedback on my previous letter regarding error fixes (https://www.postgresql.org/message-id/5736C475.3030405%40gmail.com)?

Best regards,
Alexander


19.06.2016 23:22, Jürgen Purtz пишет:

On 05.05.2016 03:09, Peter Eisentraut wrote:
I think the process should be something like this:

- Apply your XSLT performance patch.  The patch should be submitted to the next commit fest.

- Wait a while to make sure everyone is happy with the performance. Keep tweaking if necessary.

- Port all DSSSL customizations to XSLT.  Manually evaluate output for quality.

- Switch to XSLT build for official HTML documentation. [milestone 1]

- Convert sources to XML. (There could be substeps here.) [milestone 2]

- Then consider upgrading to DocBook 5. [milestone 3]

Alexander and I continue to work on this path. In the meanwhile we have reached a state where xml files are well formed and valid against docbook 4 dtd - each single file as well as the big postgres_all.xml file. Thanks to Alexander's performance patch all XSLT processes run very fast (the slowest is fo+pdf with 6:30 min).

On this basis I actually work on the HTML generation. But in opposite to the previous steps (where we create identical copies of the sgml files) the new css file is very different from the old one. This results from the following:
  • The XSLT process generates other HTML elements and other classes in comparison to the dsssl process.
  • XML files are case sensitive. All object names (id, ulink, linkend, zone, ...) are now lower case.
  • Sometimes the order of elements changed.
  • As the previous css file was constructed (some years ago) from three different css files, he contains redundant and sometimes contradictory information. I did a complete review.
To get a feedback from the community I have published the resulting postgres_all.html and its pgdoc_online.css file. Please refer to https://github.com/JuergenPurtz/pgdoc_db5/blob/master/postgresql-9.5.3/doc/src/db4_xml/postgres_all.html respective pgdoc_online.css to get the files. Please compare the html file with pages you are familiar with. And remember: the look-and-feel is similar, but far from identical.

Jürgen Purtz


Attachment

Re: Docbook 5.x

From
Jürgen Purtz
Date:

On 05.05.2016 03:09, Peter Eisentraut wrote:
I think the process should be something like this:

- Apply your XSLT performance patch.  The patch should be submitted to the next commit fest.

- Wait a while to make sure everyone is happy with the performance. Keep tweaking if necessary.

- Port all DSSSL customizations to XSLT.  Manually evaluate output for quality.

- Switch to XSLT build for official HTML documentation. [milestone 1]

- Convert sources to XML. (There could be substeps here.) [milestone 2]

- Then consider upgrading to DocBook 5. [milestone 3]

During the step from DocBook 4 to DocBook 5 [M3] we will face the problem that there are incompatibilities in the DocBook structure. Our source is affected by:
  1. It's no longer possible to use <option> and <optional> in a recursive fashion.
  2. The content model of <literal>, <function>, <command> and similar elements has changed.
In my opinion the first issue is not acceptable and the second one may need some DocBook redesign as well as manual changes in our source code. The DocBook TC has accepted this concerns and works on a solution. But as they are in the last phase of bringing 5.1 to an official OASIS standard, we probably have to wait (a long time) until 5.2. You can follow the discussion here: https://lists.oasis-open.org/archives/docbook/201606/msg00007.html and in a direct mail from Bob Stayton:

-----
Hi,
I'm investigating this issue.  I'm trying to track down the history and reasoning for this change, but have not yet completed that research.  It looks like a mistake to me to limit the content model so much when we were consciously trying to maintain backwards compatibility unless there were good reasons not to.

Bob Stayton
Sagehill Enterprises
bobs@sagehill.net
-----


Jürgen Purtz

Btw: Since 16 June 2016 Norman Welsh is no longer chairman of the TC. Now Bob holds this position, Norman changed to a 'normal' member of the TC.

Re: Docbook 5.x

From
Peter Eisentraut
Date:
On 6/6/16 8:28 AM, Peter Eisentraut wrote:
> On 6/4/16 10:28 AM, Alexander Law wrote:
>> It caused by <xsl:template name="html.head">.
>> Leaving aside the question whether the links "home", "up" and
>> "copyright" are needed, maybe it's better to split the commit to two?
>> First to speed up the conversion while making sure that the output is
>> the same, and the second to change the html.head output format.
>
> I did that intentionally, but I agree that it might be better to split
> this off into a separate commit.

I have committed the first part of this, as discussed.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Alexander Law
Date:
Great!

The next items in our plan were:

- Wait a while to make sure everyone is happy with the performance. Keep
tweaking if necessary.
- Port all DSSSL customizations to XSLT.  Manually evaluate output for
quality.

Should we now compare DSSSL outputs with XSLT?
I had some success with it before. See my letter:
https://www.postgresql.org/message-id/57712848.7060306%40gmail.com
Those xslt's (see xhtml-like-dsssl.patch) can help us to see all the
differences and to decide which customizations to keep.

Best regards,
Alexander


18.08.2016 20:56, Peter Eisentraut пишет:
> On 6/6/16 8:28 AM, Peter Eisentraut wrote:
>> On 6/4/16 10:28 AM, Alexander Law wrote:
>>> It caused by <xsl:template name="html.head">.
>>> Leaving aside the question whether the links "home", "up" and
>>> "copyright" are needed, maybe it's better to split the commit to two?
>>> First to speed up the conversion while making sure that the output is
>>> the same, and the second to change the html.head output format.
>> I did that intentionally, but I agree that it might be better to split
>> this off into a separate commit.
> I have committed the first part of this, as discussed.
>



Re: Docbook 5.x

From
Peter Eisentraut
Date:
On 8/19/16 9:14 AM, Alexander Law wrote:
> The next items in our plan were:

An immediate problem is that the patched stuff no longer works with
older stylesheets (1.76.1?).  I'm glad to leave older stuff behind for a
400% speedup, but we need to analyze the exact effect and possibly
document it or work around it.

> - Wait a while to make sure everyone is happy with the performance. Keep
> tweaking if necessary.
> - Port all DSSSL customizations to XSLT.  Manually evaluate output for
> quality.
>
> Should we now compare DSSSL outputs with XSLT?
> I had some success with it before. See my letter:
> https://www.postgresql.org/message-id/57712848.7060306%40gmail.com
> Those xslt's (see xhtml-like-dsssl.patch) can help us to see all the
> differences and to decide which customizations to keep.

It looks like the idea there is to whack the XSLT stylesheets until the
output looks exactly like the DSSSL output?  I'm not sure that's
terribly useful.  It would probably be a lot of work, which we'll just
end up removing eventually.  I'd rather just fix any formatting issues
we find and move forward.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Alexander Law
Date:
Hello,
>> The next items in our plan were:
> An immediate problem is that the patched stuff no longer works with
> older stylesheets (1.76.1?).  I'm glad to leave older stuff behind for a
> 400% speedup, but we need to analyze the exact effect and possibly
> document it or work around it.
Please consider committing attached patch. Commented-out call-template
does nothing so we can just customize our customized templates further
to support 1.76.
I've just performed the build with Ubuntu 13.04/docbook 1.76.1 - it works.
(In case the call-template generated something it would appear in
bookindex.html only.)
>
>> - Wait a while to make sure everyone is happy with the performance. Keep
>> tweaking if necessary.
>> - Port all DSSSL customizations to XSLT.  Manually evaluate output for
>> quality.
>>
>> Should we now compare DSSSL outputs with XSLT?
>> I had some success with it before. See my letter:
>> https://www.postgresql.org/message-id/57712848.7060306%40gmail.com
>> Those xslt's (see xhtml-like-dsssl.patch) can help us to see all the
>> differences and to decide which customizations to keep.
> It looks like the idea there is to whack the XSLT stylesheets until the
> output looks exactly like the DSSSL output?  I'm not sure that's
> terribly useful.  It would probably be a lot of work, which we'll just
> end up removing eventually.  I'd rather just fix any formatting issues
> we find and move forward.
That work is done already and it's results are countable and observable
differences. (See comments in the xslt.)
For example, with DSSSL we don't get a chapter TOC when the chapter
contains only one sect1 (with XSLT we get the TOC with the one item).
We also had subtoc for sect1/refentry and sect1/simplesect, but with
XSLT it's absent.
So if all such differences are not important, let's move forward.

Best regards,
Alexander


Attachment

Re: Docbook 5.x

From
Peter Eisentraut
Date:
On 8/23/16 10:23 AM, Alexander Law wrote:
>> An immediate problem is that the patched stuff no longer works with
>> > older stylesheets (1.76.1?).  I'm glad to leave older stuff behind for a
>> > 400% speedup, but we need to analyze the exact effect and possibly
>> > document it or work around it.
> Please consider committing attached patch. Commented-out call-template
> does nothing so we can just customize our customized templates further
> to support 1.76.

pushed, thanks

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Alexander Law
Date:
Hello, Peter.
>>> Should we now compare DSSSL outputs with XSLT?
>>> I had some success with it before. See my letter:
>>> https://www.postgresql.org/message-id/57712848.7060306%40gmail.com
>>> Those xslt's (see xhtml-like-dsssl.patch) can help us to see all the
>>> differences and to decide which customizations to keep.
>> It looks like the idea there is to whack the XSLT stylesheets until the
>> output looks exactly like the DSSSL output?  I'm not sure that's
>> terribly useful.  It would probably be a lot of work, which we'll just
>> end up removing eventually.  I'd rather just fix any formatting issues
>> we find and move forward.
> That work is done already and it's results are countable and
> observable differences. (See comments in the xslt.)
> For example, with DSSSL we don't get a chapter TOC when the chapter
> contains only one sect1 (with XSLT we get the TOC with the one item).
> We also had subtoc for sect1/refentry and sect1/simplesect, but with
> XSLT it's absent.
> So if all such differences are not important, let's move forward.
>
Please look at the
http://oc.postgrespro.ru/index.php/s/ttJyMDLr8Xr1HTu/download
where I have gathered together all the significant differences, that we
have between DSSSL and XSLT outputs.
I have marked red the differences that I would consider as negative.
Let's decide which ones are acceptable and which we need to eliminate.

Best regards,
Alexander



Re: Docbook 5.x

From
Jürgen Purtz
Date:
On 14.09.2016 13:18, Alexander Law wrote:
> Hello, Peter.
>>>> Should we now compare DSSSL outputs with XSLT?
>>>> I had some success with it before. See my letter:
>>>> https://www.postgresql.org/message-id/57712848.7060306%40gmail.com
>>>> Those xslt's (see xhtml-like-dsssl.patch) can help us to see all the
>>>> differences and to decide which customizations to keep.
>>> It looks like the idea there is to whack the XSLT stylesheets until the
>>> output looks exactly like the DSSSL output?  I'm not sure that's
>>> terribly useful.  It would probably be a lot of work, which we'll just
>>> end up removing eventually.  I'd rather just fix any formatting issues
>>> we find and move forward.
>> That work is done already and it's results are countable and
>> observable differences. (See comments in the xslt.)
>> For example, with DSSSL we don't get a chapter TOC when the chapter
>> contains only one sect1 (with XSLT we get the TOC with the one item).
>> We also had subtoc for sect1/refentry and sect1/simplesect, but with
>> XSLT it's absent.
>> So if all such differences are not important, let's move forward.
>>
> Please look at the
> http://oc.postgrespro.ru/index.php/s/ttJyMDLr8Xr1HTu/download
> where I have gathered together all the significant differences, that
> we have between DSSSL and XSLT outputs.
> I have marked red the differences that I would consider as negative.
> Let's decide which ones are acceptable and which we need to eliminate.
>
> Best regards,
> Alexander
>
>
>

Hello Alexander,

great job!

In my opinion most of the differences are not only acceptable but even
better. Here are some addition notes:

For me the following topics are ok: 18, 19, 22, 37.

If possible we shall invest some more effort in solutions for: 1 (up +
home not only in footer but also in header), 8, 11, 20, 30.

Topics 9 and 22 seems to be identical.

Kind regards, Jürgen



Re: Docbook 5.x

From
Alexander Law
Date:
Hello Jürgen,
14.09.2016 16:05, Jürgen Purtz wrote:
> For me the following topics are ok: 18, 19, 22, 37.
The problem with 37 is that such flat numbering present only in
refentry. Other sections (sect1, sect2) have independent numbering.
So you can't find "Table 236", but you can find "Table 27.2. Collected
Statistics Views" in monitoring-stats.html, for example. So it's
inconsistent at least.
> If possible we shall invest some more effort in solutions for: 1 (up +
> home not only in footer but also in header), 8, 11, 20, 30.
What is marked "+" in the "XSLT alignment exists" column is already
solved. I mean that I've developed the XSLT templates that eliminate the
indicated differences.
I presented a patch with all the templates before (for 9.5) and adapted
it for the master branch now.
> Topics 9 and 22 seems to be identical.
Yes, that difference required changes in two places and I duplicated the
topic erroneously.

Best regards,
Alexander


Re: Docbook 5.x

From
Alvaro Herrera
Date:
Jürgen Purtz wrote:

> Hello Alexander,
>
> great job!
>
> In my opinion most of the differences are not only acceptable but even
> better.

Agreed. But there are a few that merit a fix.  There are a few that look
a matter of style (XSL output looks odd), another few are not important.
But I think missing TOC for example is a problem.

1 is a customization we went great lengths to add, so it's a must fix I
think.

I think 12, 14, 15 merit more research too.

What's up with 30?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Alexander Law
Date:
Hello Alvaro,

14.09.2016 18:30, Alvaro Herrera wrote:
> What's up with 30?

ecpg.sgml contains "<remark>The scope of the allocated descriptor is
WHAT?.</remark>", which is printed with XSLT but ignored with DSSSL.

Maybe we can just remove such remarks. I found only two of them, other
one in dml.sgml:

<chapter id="dml">
  <title>Data Manipulation</title>
  <remark>
   This chapter is still quite incomplete.
  </remark>

Best regards,
Alexander



Re: Docbook 5.x

From
Tom Lane
Date:
Alexander Law <exclusion@gmail.com> writes:
> Hello Alvaro,
> 14.09.2016 18:30, Alvaro Herrera wrote:
>> What's up with 30?

> ecpg.sgml contains "<remark>The scope of the allocated descriptor is
> WHAT?.</remark>", which is printed with XSLT but ignored with DSSSL.

> Maybe we can just remove such remarks. I found only two of them, other
> one in dml.sgml:

> <chapter id="dml">
>   <title>Data Manipulation</title>
>   <remark>
>    This chapter is still quite incomplete.
>   </remark>

Change them to SGML comments?  Although actually the DML one should
be removed, I think it's ancient and obsolete.

            regards, tom lane


Re: Docbook 5.x

From
Alvaro Herrera
Date:
Alexander Law wrote:

> 14.09.2016 18:30, Alvaro Herrera wrote:
> >What's up with 30?
>
> ecpg.sgml contains "<remark>The scope of the allocated descriptor is
> WHAT?.</remark>", which is printed with XSLT but ignored with DSSSL.
>
> Maybe we can just remove such remarks.

I added Michael on CC.  Maybe he can clarify what the scope of the
descriptor is, so that we can add some proper sentence, and remove the
<remark> tag.


It's strange that the xslt prints the <remark> text, when the historical
behavior was to ignore it.  Maybe OASIS changed their mind as to what it
meant.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Alexander Law
Date:
14.09.2016 19:41, Alvaro Herrera wrote:
> Alexander Law wrote:
>
>> 14.09.2016 18:30, Alvaro Herrera wrote:
>>> What's up with 30?
>> ecpg.sgml contains "<remark>The scope of the allocated descriptor is
>> WHAT?.</remark>", which is printed with XSLT but ignored with DSSSL.
>>
>> Maybe we can just remove such remarks.
> I added Michael on CC.  Maybe he can clarify what the scope of the
> descriptor is, so that we can add some proper sentence, and remove the
> <remark> tag.
>
>
> It's strange that the xslt prints the <remark> text, when the historical
> behavior was to ignore it.  Maybe OASIS changed their mind as to what it
> meant.
>
In fact it's controlled by show.comments variable in XSLT, which is
defined in doc/src/sgml/stylesheet-common.xsl as:

<xsl:param name="show.comments">
   <xsl:choose>
     <xsl:when test="contains($pg.version, 'devel')">1</xsl:when>
     <xsl:otherwise>0</xsl:otherwise>
   </xsl:choose>
</xsl:param>

And it's just inconsistent with the DSSSL definition ((define
%show-comments%         draft-mode)). So we can just ignore the
difference. Though question that is still open since 2003 could be
answered at the end of the day.




Re: Docbook 5.x

From
Alexander Law
Date:
Hello,

I've modified XSL's to eliminate most undesired differences between
`make html` and `make xslthtml` outputs. The patch attached.
See http://oc.postgrespro.ru/index.php/s/Gj2PGZ9IHUbDC5t/download
(eliminated differences marked cyan).
What should we do next to finish with the "Port all DSSSL customizations
to XSLT" item in our plan?

Best regards,
Alexander

14.09.2016 18:30, Alvaro Herrera wrote:
> I think 12, 14, 15 merit more research too.
>
> What's up with 30?
>


Attachment

Re: Docbook 5.x

From
Jürgen Purtz
Date:
The Docbook TC is discussing the mid-term plans for Docbook 6, see:
https://lists.oasis-open.org/archives/docbook/201610/msg00005.html .
Just like at Docbook 5, the normative standard will be written in
RelaxNG+Schematron. In the past they generated dtd and xsd files out of
RelaxNG. But for Docbook 6 they consider to drop the xsd version - the
dtd will survive as it can be generated automatically.

* Does anybody need the xsd version in the long term? * xmllint is able
to validate against RelaxNG.

Kind regards, Jürgen




Re: Docbook 5.x

From
Peter Eisentraut
Date:
On 9/14/16 7:18 AM, Alexander Law wrote:
> Please look at the
> http://oc.postgrespro.ru/index.php/s/ttJyMDLr8Xr1HTu/download
> where I have gathered together all the significant differences, that we
> have between DSSSL and XSLT outputs.
> I have marked red the differences that I would consider as negative.
> Let's decide which ones are acceptable and which we need to eliminate.

Thank you for making that list.  I have a similar list that is not quite
the same, so together we'll probably find all the problems.  I have
checked both lists and most of the issues are not terribly critical.

I have now committed fixes for what I think were the major missing
usability issues: header customization and index letter links.  I would
be comfortable with switching the default build to XSLT now and work out
the remaining issues on the fly.  What do you think?

(I also have a similar list for switching the PDF build from jadetex to
fop.  We are also in pretty good shape there, but I have not finished
the evaluation fully.)

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> I have now committed fixes for what I think were the major missing
> usability issues: header customization and index letter links.  I would
> be comfortable with switching the default build to XSLT now and work out
> the remaining issues on the fly.  What do you think?

What does that imply in terms of changing build dependencies for people
who want to build the docs?

            regards, tom lane


Re: Docbook 5.x

From
Peter Eisentraut
Date:
On 11/8/16 10:02 AM, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> I have now committed fixes for what I think were the major missing
>> usability issues: header customization and index letter links.  I would
>> be comfortable with switching the default build to XSLT now and work out
>> the remaining issues on the fly.  What do you think?
>
> What does that imply in terms of changing build dependencies for people
> who want to build the docs?

I will write a full explanation to hackers before making any change.
But the short answer is, it's the same tools we use for building the man
pages now, so if you can run make man or make world, you're set.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Alexander Law
Date:
Hello Peter,

I think it's time to switch to XSLT. We at Postgres Pro already build
(converted to XML) and publish the docs (for version 9.6, in English and
Russian) and we've got only one bug report related to the
header/navigation differences.
(See https://postgrespro.com/docs/postgresql/9.6/tutorial-concepts.html
vs https://postgrespro.com/docs/postgresql/9.5/tutorial-concepts.html).
There are other customizations that I would like to apply, but it could
be done later.
And as we should completely move away from DSSSL to migrate to XML, I
think the sooner we switch to XSLT, the better.

Best regards,
Alexander

08.11.2016 16:25, Peter Eisentraut wrote:
> I have now committed fixes for what I think were the major missing
> usability issues: header customization and index letter links.  I would
> be comfortable with switching the default build to XSLT now and work out
> the remaining issues on the fly.  What do you think?
>



Re: Docbook 5.x

From
Alexander Law
Date:
Hello Peter,

I saw that you committed the patch to switch the html build to XSLT by
default.
So It seems, now we can continue the move to XML.
I'd suggest to move in several steps.
Please see the attached scripts.
The main script is 7_check_conversion.sh. It performs all the conversion
and checks whether the html output is the same.
I suggest to split conversion in three commits.
Commit#0 is for manual corrections - it replaces "<" with "<" and so
on in some sgml's and it doesn't affect the build or outputs.
It needed just for the next step - automatic conversion. These changes
to sgml's are countable and observable.

Commit#1 performs conversion of all SGML's to make them compatible with
XML (as much as possible). (Thanks to Jurgen for his sgml2xml.pl script.)
These changes to sgml's are massive, but they are produced
automatically, so we need just to check the script and make sure that
the output is the same.
After that commit we still can use SGML build.

And the last commit, commit#2 is for switching to XML. At that point
doc/src/sgml renamed to doc/src/xml, build environment modified and
cleaned, but changes in sgml/xml are minimal, so we can observe and
check them.

After the commit#2 we get all our docs in XML (DocBook 4.2) and can
build it just as we did with 'make html/man/...' before.
Maybe the commit#2 should be applied later, but commits #0 and #1 are
not intrusive and can be applied anytime.

Best regards,
Alexander


Attachment

Re: Docbook 5.x

From
Jürgen Purtz
Date:
> Hello Peter,
>
> I saw that you committed the patch to switch the html build to XSLT by
> default.
> So It seems, now we can continue the move to XML.
> I'd suggest to move in several steps.
> Please see the attached scripts.
> The main script is 7_check_conversion.sh. It performs all the
> conversion and checks whether the html output is the same.
> I suggest to split conversion in three commits.
> Commit#0 is for manual corrections - it replaces "<" with "<" and
> so on in some sgml's and it doesn't affect the build or outputs.
> It needed just for the next step - automatic conversion. These changes
> to sgml's are countable and observable.
>
> Commit#1 performs conversion of all SGML's to make them compatible
> with XML (as much as possible). (Thanks to Jurgen for his sgml2xml.pl
> script.)
> These changes to sgml's are massive, but they are produced
> automatically, so we need just to check the script and make sure that
> the output is the same.
> After that commit we still can use SGML build.
>
> And the last commit, commit#2 is for switching to XML. At that point
> doc/src/sgml renamed to doc/src/xml, build environment modified and
> cleaned, but changes in sgml/xml are minimal, so we can observe and
> check them.
>
> After the commit#2 we get all our docs in XML (DocBook 4.2) and can
> build it just as we did with 'make html/man/...' before.
> Maybe the commit#2 should be applied later, but commits #0 and #1 are
> not intrusive and can be applied anytime.
>
> Best regards,
> Alexander
>

Hello,

I greatly welcome the next steps toward XML.

In addition to the submitted scripts I want to point out that we will
get validation error messages if the following three things coincide: a)
Docbook 4.2, b) use of XInclude, c) a different directory (eg: 'ref').
The xi:base attribute (to specify a different directory) was introduced
into the Docbook DTD with version 4.3. Older DTDs cannot use it without
manual changes to the DTD, please see:
http://www.sagehill.net/docbookxsl/ValidXinclude.html. To overcome this
shortage I suggest the use of Docbook 4.3 (or 4.5) - as a separate
commit or implicitly with commit #2. As far as I have seen, all
converted documents validate against 4.5.

Kind regards
Jürgen Purtz


Re: [DOCS] Docbook 5.x

From
Alexander Law
Date:
Hello Peter,

16.11.2016 14:30, Alexander Law wrote:
> So It seems, now we can continue the move to XML.
> I'd suggest to move in several steps.
> Please see the attached scripts.
> The main script is 7_check_conversion.sh. It performs all the
> conversion and checks whether the html output is the same.
> I suggest to split conversion in three commits.
> Commit#0 is for manual corrections - it replaces "<" with "<" and
> so on in some sgml's and it doesn't affect the build or outputs.
> It needed just for the next step - automatic conversion. These changes
> to sgml's are countable and observable.
>
> Commit#1 performs conversion of all SGML's to make them compatible
> with XML (as much as possible). (Thanks to Jurgen for his sgml2xml.pl
> script.)
> These changes to sgml's are massive, but they are produced
> automatically, so we need just to check the script and make sure that
> the output is the same.
> After that commit we still can use SGML build.
>
> And the last commit, commit#2 is for switching to XML. At that point
> doc/src/sgml renamed to doc/src/xml, build environment modified and
> cleaned, but changes in sgml/xml are minimal, so we can observe and
> check them.
>
> After the commit#2 we get all our docs in XML (DocBook 4.2) and can
> build it just as we did with 'make html/man/...' before.
> Maybe the commit#2 should be applied later, but commits #0 and #1 are
> not intrusive and can be applied anytime.
I've rebased previous patches for the current "10devel" version.
Will we continue move to DocBook.XML?
Are there any obstacles that may keep us from moving forward?

Best regards,
Alexander


Attachment

Re: [DOCS] Docbook 5.x

From
Jürgen Purtz
Date:
Hello Alexander,

On 28.02.2017 09:55, Alexander Law wrote:
> Hello Peter,
>
> 16.11.2016 14:30, Alexander Law wrote:
>> So It seems, now we can continue the move to XML.
>> I'd suggest to move in several steps.
>> Please see the attached scripts.
>> The main script is 7_check_conversion.sh. It performs all the
>> conversion and checks whether the html output is the same.
>> I suggest to split conversion in three commits.
>> Commit#0 is for manual corrections - it replaces "<" with "<" and
>> so on in some sgml's and it doesn't affect the build or outputs.
>> It needed just for the next step - automatic conversion. These
>> changes to sgml's are countable and observable.
>>
>> Commit#1 performs conversion of all SGML's to make them compatible
>> with XML (as much as possible). (Thanks to Jurgen for his sgml2xml.pl
>> script.)
>> These changes to sgml's are massive, but they are produced
>> automatically, so we need just to check the script and make sure that
>> the output is the same.
>> After that commit we still can use SGML build.
>>
>> And the last commit, commit#2 is for switching to XML. At that point
>> doc/src/sgml renamed to doc/src/xml, build environment modified and
>> cleaned, but changes in sgml/xml are minimal, so we can observe and
>> check them.
>>
>> After the commit#2 we get all our docs in XML (DocBook 4.2) and can
>> build it just as we did with 'make html/man/...' before.
>> Maybe the commit#2 should be applied later, but commits #0 and #1 are
>> not intrusive and can be applied anytime.
> I've rebased previous patches for the current "10devel" version.
> Will we continue move to DocBook.XML?
> Are there any obstacles that may keep us from moving forward?
>
> Best regards,
> Alexander
>
the time gap between commit#1 and commit#2 shall be small as people may
create - in accordance with SGML - additional empty elements and shorttags.

The attached version of sgml2xml.pl is cleaned up by elimination of
unused variables and some modifications in the comments.

Kind regards, Jürgen




Attachment

Re: [DOCS] Docbook 5.x

From
Peter Eisentraut
Date:
On 2/28/17 03:55, Alexander Law wrote:
> I've rebased previous patches for the current "10devel" version.
> Will we continue move to DocBook.XML?
> Are there any obstacles that may keep us from moving forward?

We still haven't gotten rid of all the DSSSL use.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Docbook 5.x

From
Peter Eisentraut
Date:
On 2/28/17 03:55, Alexander Law wrote:
> I've rebased previous patches for the current "10devel" version.
> Will we continue move to DocBook.XML?

I'm moving this to the next commit fest.  The conversion from SGML to
XML will be a theme for the PG11 development cycle.  For PG10, we have
accomplished the conversion from DSSSL to XSLT.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [DOCS] Docbook 5.x

From
Peter Eisentraut
Date:
On 2/28/17 03:55, Alexander Law wrote:
> I've rebased previous patches for the current "10devel" version.
> Will we continue move to DocBook.XML?
> Are there any obstacles that may keep us from moving forward?

I have started working through these patches now.  I have committed the
escaping of < and & and will work through the rest slowly, to minimize
disruptions to other development.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [DOCS] Docbook 5.x

From
Alexander Lakhin
Date:
Hello,

06.09.2017 18:54, Peter Eisentraut wrote:
>
> I have started working through these patches now. I have committed the 
> escaping of < and & and will work through the rest slowly, to minimize 
> disruptions to other development.
Great!

I have rebased all the remaining patches and updated scripts for the 
current master (see attachment).

Best regards,
Alexander

-- 
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Attachment

Re: [DOCS] Docbook 5.x

From
Thomas Munro
Date:
On Sat, Sep 9, 2017 at 12:30 AM, Alexander Lakhin <exclusion@gmail.com> wrote:
> Hello,
>
> 06.09.2017 18:54, Peter Eisentraut wrote:
>>
>>
>> I have started working through these patches now. I have committed the
>> escaping of < and & and will work through the rest slowly, to minimize
>> disruptions to other development.
>
> Great!
>
> I have rebased all the remaining patches and updated scripts for the current
> master (see attachment).

Hi Alexander,

In future versions of this patch set, if there is a dependency between
the patches would you mind indicating the order to apply them, perhaps
with prefixes like "0001-"?  That would be quite useful for humans who
aren't yet familiar enough with your patch set to guess the order, and
also for stupid patch testing robots:

== Fetched patches from message ID
f00bf53f-e6b5-a033-69be-0c63878f0d30%40gmail.com
== Applying on top of commit 3c435952176ae5d294b37e5963cd72ddb66edead
== Applying patches from tarball pg-doc.check.tar.bz2...
== Applying patch pg-doc.check/patches/sgml-xml/ecpg.patch...
== Applying patch pg-doc.check/patches/sgml-xml/func.patch...
== Applying patch
pg-doc.check/patches/sgml-xml/generate-errcodes-table.pl.patch...
== Applying patch pg-doc.check/patches/sgml-xml/pgtesttiming.patch...
== Applying patch pg-doc.check/patches/sgml-xml/release-10.patch...
1 out of 5 hunks FAILED -- saving rejects to file
doc/src/sgml/release-10.sgml.rej

Thanks!

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Re: [DOCS] Docbook 5.x

From
Peter Eisentraut
Date:
On 9/8/17 08:30, Alexander Lakhin wrote:
>> I have started working through these patches now. I have committed the 
>> escaping of < and & and will work through the rest slowly, to minimize 
>> disruptions to other development.
> Great!
> 
> I have rebased all the remaining patches and updated scripts for the 
> current master (see attachment).

So, I've been looking at this profiling stuff, to replace the marked
sections in the installation instructions.  I found the overhead of that
a bit too much for building the full documentation, so I have come up
with the attached alternative solution.  What do you think?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Attachment

Re: [DOCS] Docbook 5.x

From
Jürgen Purtz
Date:
On 15.09.2017 19:32, Peter Eisentraut wrote:
On 9/8/17 08:30, Alexander Lakhin wrote:
I have started working through these patches now. I have committed the 
escaping of < and & and will work through the rest slowly, to minimize 
disruptions to other development.
Great!

I have rebased all the remaining patches and updated scripts for the 
current master (see attachment).
So, I've been looking at this profiling stuff, to replace the marked
sections in the installation instructions.  I found the overhead of that
a bit too much for building the full documentation, so I have come up
with the attached alternative solution.  What do you think?


I'm not happy with the 'particular conversions'-part of 'standalone-profile.xsl'. It applies subsequent modifications, which are in not very intuitive to a reader, eg:

<xsl:template match="phrase[@id='install-ldap-links']">
  <xsl:text>the documentation about client authentication and libpq</xsl:text>
</xsl:template>

This approach spreads the intended text over two very different files (in this example: 'installation.xml' and 'standalone-profile.xsl').

My suggestion is to keep the source code in one file in the same manner as with the SGML standalone-include/standalone-ignore mechanism. A generic xsl file shall create the extended output similar to 'standalone-profile.xsl'.


installation.xml:
support for authentication and connection parameter lookup (see
<phrase condition="standalone">the documentation about client authentication and libpq</phrase>
<phrase condition="default"><xref linkend="libpq-ldap"/> and <xref linkend="auth-ldap"/></phrase>
for more information). On Unix,

...


collectAll.xsl (similar to standalone-profile.xsl):  <!-- parameters and variables -->
  <xsl:param name="pg.Standalone" select="'default'"/>
  <!-- <xsl:param name="pg.Standalone" select="'standalone'"/>  -->

  <!-- Process all nodes  -->
  <xsl:template match="*|@*|text()|processing-instruction()|comment()">
    <xsl:choose>
      <xsl:when test="(not (@condition) or @condition=$pg.Standalone )">
        <!-- copy nodes without a 'condition' attribute and such nodes, where 'condition' meets  the given criteria  -->
        <xsl:copy>
          <xsl:apply-templates select="*|@*|text()|processing-instruction()|comment()"/>
        </xsl:copy>
      </xsl:when>
    </xsl:choose>
  </xsl:template>

I'm sorry that I actually cannot deliver a patch because I'm abroad and have limited resources (but many challenges). But I hope that the idea gets clear. The attached collectAll.xsl file contains a more complex solution for the case that we have to deal with more than one include/ignore type, eg: index-generating.


Attachment

Re: [DOCS] Docbook 5.x

From
Peter Eisentraut
Date:
On 9/15/17 14:54, Jürgen Purtz wrote:
> My suggestion is to keep the source code in one file in the same manner
> as with the SGML standalone-include/standalone-ignore mechanism. A
> *generic* xsl file shall create the extended output similar to
> 'standalone-profile.xsl'.
> 
> 
> installation.xml:
> 
> support for authentication and connection parameter lookup (see
> <phrase condition="standalone">the documentation about client
> authentication and libpq</phrase>
> <phrase condition="default"><xref linkend="libpq-ldap"/> and<xref
> linkend="auth-ldap"/></phrase>
> for more information). On Unix,
> ...

That is what the standard DocBook profiling system does.  But that has
the disadvantage of imposing a performance penalty on building the
documentation.  Unless you have a way around that that I'm not seeing.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Re: [DOCS] Docbook 5.x

From
Alexander Lakhin
Date:
Hello,
15.09.2017 21:54, Jürgen Purtz wrote:
On 15.09.2017 19:32, Peter Eisentraut wrote:
On 9/8/17 08:30, Alexander Lakhin wrote:
I have started working through these patches now. I have committed the 
escaping of < and & and will work through the rest slowly, to minimize 
disruptions to other development.
Great!

I have rebased all the remaining patches and updated scripts for the 
current master (see attachment).
So, I've been looking at this profiling stuff, to replace the marked
sections in the installation instructions.  I found the overhead of that
a bit too much for building the full documentation, so I have come up
with the attached alternative solution.  What do you think?
Peter, can you show what performance drop you see with the profiling (e.g. for HTML)?
I get the following numbers:
1. make html with profiling (import profile-chunk.xsl in stylesheet.xsl):
85.98user 0.76system 1:29.85elapsed

2. make html without profiling (import chunk.xsl in stylesheet.xsl):
77.36user 0.62system 1:21.28elapsed

3. Separate profiling (performed before making epub, as dbtoepub doesn't support profiling)
8.52user 0.22system 0:10.31elapsed

So I get ~10% performance drop when making html. Are you concerned about the same overhead?
I would choose some standard way to have separate content in the same file, but if the overhead is not acceptable, and we're not going to extend the profiling usage, then we need to invent something that will complicate XML-related processing (I think about translation but the other issues are possible too).

I'm not happy with the 'particular conversions'-part of 'standalone-profile.xsl'. It applies subsequent modifications, which are in not very intuitive to a reader, eg:

<xsl:template match="phrase[@id='install-ldap-links']">
  <xsl:text>the documentation about client authentication and libpq</xsl:text>
</xsl:template>

This approach spreads the intended text over two very different files (in this example: 'installation.xml' and 'standalone-profile.xsl').

My suggestion is to keep the source code in one file in the same manner as with the SGML standalone-include/standalone-ignore mechanism. A generic xsl file shall create the extended output similar to 'standalone-profile.xsl'.

Jürgen, this approach implemented by applying profiling.xsl in Makefile (for make postgres.epub). (See Makefile in https://www.postgresql.org/message-id/attachment/54854/pg-doc.check.tar.bz2)

Best regards,
Alexander

Re: [DOCS] Docbook 5.x

From
Peter Eisentraut
Date:
On 9/18/17 10:38, Alexander Lakhin wrote:
> Peter, can you show what performance drop you see with the profiling
> (e.g. for HTML)?
> I get the following numbers:
> 1. make html with profiling (import profile-chunk.xsl in stylesheet.xsl):
> 85.98user 0.76system 1:29.85elapsed
> 
> 2. make html without profiling (import chunk.xsl in stylesheet.xsl):
> 77.36user 0.62system 1:21.28elapsed
> 
> 3. Separate profiling (performed before making epub, as dbtoepub doesn't
> support profiling)
> 8.52user 0.22system 0:10.31elapsed
> 
> So I get ~10% performance drop when making html. Are you concerned about
> the same overhead?

Yeah, that's about what I see.

> I would choose some standard way to have separate content in the same
> file, but if the overhead is not acceptable, and we're not going to
> extend the profiling usage, then we need to invent something that will
> complicate XML-related processing (I think about translation but the
> other issues are possible too).

It's only for the INSTALL file and won't get used anywhere else, so I
think it's OK to have a bit of an ad-hoc system.

> Jürgen, this approach implemented by applying profiling.xsl in Makefile
> (for make postgres.epub). (See Makefile in
> https://www.postgresql.org/message-id/attachment/54854/pg-doc.check.tar.bz2)

I hadn't even looked at that yet, but that kind of supports my point.
Injecting the profiling layer everywhere is going to be annoying.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Re: [DOCS] Docbook 5.x

From
Alexander Lakhin
Date:
Hello Peter,
19.09.2017 23:05, Peter Eisentraut wrote:
>> I would choose some standard way to have separate content in the same
>> file, but if the overhead is not acceptable, and we're not going to
>> extend the profiling usage, then we need to invent something that will
>> complicate XML-related processing (I think about translation but the
>> other issues are possible too).
> It's only for the INSTALL file and won't get used anywhere else, so I
> think it's OK to have a bit of an ad-hoc system.
Well, then I would suggest to place all the extra content in the 
installation-single.xsl file and to add all the alternate text to 
installation.xml.
I would like to place such alternate text in an attribute 
"standalonetext" or alike, but DocBook 4.x doesn't allow for extra 
attributes, so we need to choose from: 
http://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes
So I decided to make alternative use of xreflabel attribute. (I believe 
that we could find something more appropriate after migrating from 
Docbook 4.2 to 5.x).)

Please see patches/xml/installation.patch in the attachment (or 
installation.xml in doc/src/xml after ) for an example.
Makefile (in patches/xml/) is adjusted for the new approach too.
(Main output is slightly changed after switching from 
"profile-chunk.xsl" to "chunk.xsl", I'll fix it later if we choose this 
way.)
(Archive in the attachment is packed twice to avoid the automatic patch 
checking...)

Best regards,
Alexander

-- 
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Attachment

Re: [DOCS] Docbook 5.x

From
Peter Eisentraut
Date:
On 9/20/17 09:00, Alexander Lakhin wrote:
> I would like to place such alternate text in an attribute 
> "standalonetext" or alike, but DocBook 4.x doesn't allow for extra 
> attributes, so we need to choose from: 
> http://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes
> So I decided to make alternative use of xreflabel attribute. (I believe 
> that we could find something more appropriate after migrating from 
> Docbook 4.2 to 5.x).)

Yeah, but that's a bit of a hack isn't it?  I also don't see anything in
DocBook 5 that indicates to me that we could get rid of that hack then.

In the interest of moving things along, I have committed my patch and
will continue working on the rest of the patch set.

Improvements are welcome and can be submitted separately, but I think
it's hardly worth it because this stuff changes so rarely.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Re: [DOCS] Docbook 5.x

From
Alexander Lakhin
Date:
Hello,
27.09.2017 18:38, Peter Eisentraut wrote:
> On 9/20/17 09:00, Alexander Lakhin wrote:
>> I would like to place such alternate text in an attribute
>> "standalonetext" or alike, but DocBook 4.x doesn't allow for extra
>> attributes, so we need to choose from:
>> http://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes
>> So I decided to make alternative use of xreflabel attribute. (I believe
>> that we could find something more appropriate after migrating from
>> Docbook 4.2 to 5.x).)
> Yeah, but that's a bit of a hack isn't it?  I also don't see anything in
> DocBook 5 that indicates to me that we could get rid of that hack then.
I found a more appropriate way for using an extra attribute:
http://www.sagehill.net/docbookxsl/AddProfileAtt.htm
I've tested it with "standalone" attribute and it works (for Docbook 4.2).
May be it will decrease hackiness level of the solution?
If it's not too late I could prepare a new patchset today.
> In the interest of moving things along, I have committed my patch and
> will continue working on the rest of the patch set.
>
> Improvements are welcome and can be submitted separately, but I think
> it's hardly worth it because this stuff changes so rarely.
But with such approach we should translate that .xsl too and it seems 
rather strange, isn't it?

------
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Re: [DOCS] Docbook 5.x

From
Alexander Lakhin
Date:
27.09.2017 19:11, Alexander Lakhin wrote:
>> In the interest of moving things along, I have committed my patch and
>> will continue working on the rest of the patch set.
>>
>> Improvements are welcome and can be submitted separately, but I think
>> it's hardly worth it because this stuff changes so rarely.
> But with such approach we should translate that .xsl too and it seems 
> rather strange, isn't it?
Improved patch attached.


-- 
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Attachment

Re: [DOCS] Docbook 5.x

From
Alexander Lakhin
Date:
Hello Peter,

16.11.2017 00:06, Peter Eisentraut wrote:
Here is the final patch set for the conversion.
Great!

I have some questions.
1. Can you share the exact scripts you use to generate 0002?
(We transform the documentation to xml since 9.6 on-fly so it would be nice to adjust our scripts for previous versions too.)
2. Will you rename *.sgml to *.xml and move src/sgml to src/xml?
3. And I still think that hard-coding references and replacements in the standalone-profile is not good. Are you going to apply less hackish approach?
(See https://www.postgresql.org/message-id/b4fd6932-8d44-3210-48ab-bbc393bf9a23%40gmail.com , patches/xml/installation.patch and patches/xml/installation-single.xsl)
4. BTW, when making postgres-A4.pdf, I get
[WARN] FOUserAgent - Destination: Unresolved ID reference "ecpg-type-timestamp-date" found.
It can be fixed by moving the id from the title tag to sect4:
-    <sect4>
-     <title id="ecpg-type-timestamp-date">timestamp, date</title>
+    <sect4 id="ecpg-type-timestamp-date">
+     <title>timestamp, date</title>
I wonder, what were the reasons to set id for the title tag?

Best regards,
------
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [DOCS] Docbook 5.x

From
Peter Eisentraut
Date:
On 11/16/17 01:00, Alexander Lakhin wrote:
> 1. Can you share the exact scripts you use to generate 0002?

I used the script you (or Jürgen) provided.

> 2. Will you rename *.sgml to *.xml and move src/sgml to src/xml?

Not right now, but we should have that discussion at some point over the
next few months.

> 3. And I still think that hard-coding references and replacements in the
> standalone-profile is not good. Are you going to apply less hackish
> approach?
> (See
> https://www.postgresql.org/message-id/b4fd6932-8d44-3210-48ab-bbc393bf9a23%40gmail.com
> , patches/xml/installation.patch and patches/xml/installation-single.xsl)

The patches you send me are very confusing.  If you can send me a
separate patch of what you have in mind and why and so on, we can
consider it separately, but right now I don't have much interest in
revisiting this.  What is committed works well, stays out of the way,
and causes no problems AFAICT.

> 4. BTW, when making postgres-A4.pdf, I get
> [WARN] FOUserAgent - Destination: Unresolved ID reference
> "ecpg-type-timestamp-date" found.
> It can be fixed by moving the id from the title tag to sect4:
> -    <sect4>
> -     <title id="ecpg-type-timestamp-date">timestamp, date</title>
> +    <sect4 id="ecpg-type-timestamp-date">
> +     <title>timestamp, date</title>
> I wonder, what were the reasons to set id for the title tag?

In some cases the ids on the title tags were necessary for the DSSSL
stylesheets.  They can be removed now with some care.  I have some
partial patches for that.

These particular warnings seem harmless.  I have found the following
reference:
<https://lists.oasis-open.org/archives/docbook-apps/201510/msg00052.html>

But I see the same warnings in the PG10 build, so this does not appear
to be a regression from the XML conversion.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Re: [DOCS] Docbook 5.x

From
Peter Eisentraut
Date:
On 11/15/17 16:06, Peter Eisentraut wrote:
> Here is the final patch set for the conversion.

I have committed this.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [DOCS] Docbook 5.x

From
Alexander Lakhin
Date:
Hello,
23.11.2017 17:53, Peter Eisentraut wrote:
On 11/15/17 16:06, Peter Eisentraut wrote:
Here is the final patch set for the conversion.
I have committed this.
Great! Thank you for your work!
And in light of possible need to convert to xml older branches too, maybe we should simplify INSTALL now.
Please, consider applying the attached patch. It produces the same INSTALL and is much better in the following aspects.

1. All the INSTALL content is placed in two files (installation.sgml and installation-single.xsl) instead of three (installation.sgml, standalone-install.xml, standalone-profile.xsl).
2. There are no unreadable and untranslatable (in context) constructions such as
<xsl:template match="xref[@linkend='plpython-python23']">
  <xsl:text>the </xsl:text><application>PL/Python</application><xsl:text> documentation</xsl:text>
</xsl:template>
(Sometimes translators need to replace larger fragments.)
3. It uses only XSLT (which we use already), no xi:include.
4. It doesn't generate complete postgres.sgml to process only the installation section.

I understand that it will take some time to review it, but I think it's justified by the portability and supportability reasons.

Best regards,

------
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: [DOCS] Docbook 5.x

From
Peter Eisentraut
Date:
On 11/25/17 00:50, Alexander Lakhin wrote:
> Please, consider applying the attached patch. It produces the same
> INSTALL and is much better in the following aspects.
> 
> 1. All the INSTALL content is placed in two files (installation.sgml and
> installation-single.xsl) instead of three (installation.sgml,
> standalone-install.xml, standalone-profile.xsl).
> 2. There are no unreadable and untranslatable (in context) constructions
> such as
> <xsl:template match="xref[@linkend='plpython-python23']">
>   <xsl:text>the
> </xsl:text><application>PL/Python</application><xsl:text>
> documentation</xsl:text>
> </xsl:template>
> (Sometimes translators need to replace larger fragments.)
> 3. It uses only XSLT (which we use already), no xi:include.
> 4. It doesn't generate complete postgres.sgml to process only the
> installation section.

What is the standalonetext attribute?  I don't see that in the DTD.
Wouldn't that fail validation?

Also, the makefile fragments need to be written differently.  You can't
use pipes; that would not catch any errors.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [DOCS] Docbook 5.x

From
Alexander Lakhin
Date:
Hello Peter,
28.11.2017 19:02, Peter Eisentraut wrote:
>
> What is the standalonetext attribute?  I don't see that in the DTD.
> Wouldn't that fail validation?
I've added it to postgres.sgml as allowed by DocBook standard:
http://www.sagehill.net/docbookxsl/AddProfileAtt.html
+<!ENTITY % local.effectivity.attrib "standalonetext CDATA #IMPLIED">

> Also, the makefile fragments need to be written differently.  You can't
> use pipes; that would not catch any errors.

I believe, that the last xsltproc will fail in case of error in previous 
command(s) as it will not get valid xml. For example, if sed failed with 
error, I get:
-:1: parser error : Document is empty


Best regards,
------
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company