Thread: Docbook 5.x
actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an upgrade to DocBook 5.x. This sounds simple, but it will be a long process with many sub-tasks.
Rationale:
- Sooner or later we MUST migrate as the 4.x series is outdated: V4.2 dates back to 2002. The 4.x series is no longer actively developed since 2006. See: http://www.docbook.org/tdg5/en/html/ch01.html "In October 2006, the DocBook Technical Committee released DocBook V4.5, the last release planned in the 4.x series."
- V5.0 is available since 2009. See: http://www.docbook.org/tdg5/en/html/ch01.html: "DocBook V5.0 became an official Committee Specification in June 2009 and became an official OASIS Standard in October 2009."
- Actually the technical committee has the third Candidate Release for V5.1.
PROs:
- The formal part of the migration is supported by existing tools: http://docbook.org/docs/howto/#convert4to5 (nevertheless some scripts written by ourself will be necessary).
- The normative schema for Docbook 5.x is written in RELAX NG. Additionally the technical committee converts this normative schema to a XSD schema and to DTD, which are not normative but very near to RELAX NG and will fit for most applications. Hence, we have the choice between three schema syntaxes and everybody can use his favourite one.
- Our source file format will switch from SGML to XML. This implies that we have access to all XML features like XLink, XPath, XSLT, XSL-FO, SVG, MathML, namespaces, ... .
- The migration from 4.x to 5.x implies major changes at 3 different levels.
- DocBook structure: Previously it was defined in SGML syntax (DTD). Now it is defined in RELAX NG schema language plus Schematron rules.
- DocBook files: Previously we used SGML syntax for our files. We must convert them to a valid XML syntax, eg: tag omission.
- Tools and style sheets: All tools which operate at the native SGML-level (editors, conversions, ...) must be replaced by XML conforming tools. As valid XML implicitly conforms to a valid SGML syntax this step may be accomplished by reconfiguring some of the tools, eg.: .emacs.
- Conversion of sgml files to valid xml syntax with a perl skript. I failed to use 'osx' or 'spam'.
- Conversion of these xml files to Docbook5.x format using xsltproc and Docbooks xslt-migration skripts.
- Creation of html files using xsltproc and Docbooks xslt skripts.
- Creation of fo files using xsltproc and Docbooks xslt skripts.
- Creation of pdf files using fop.
- The conversions needs less than 10 minutes on a Intel i5 processor.
Any ideas or suggestions? Shall we go further on this way? Has anybody more experiences in SGML-->XML conversions or Docbook 4.x --> 5.x conversions?
Kind regards
Jürgen Purtz
Attachment
What I have done so far is:
- Conversion of sgml files to valid xml syntax with a perl skript. I failed to use 'osx' or 'spam'.
- Conversion of these xml files to Docbook5.x format using xsltproc and Docbooks xslt-migration skripts.
- Creation of html files using xsltproc and Docbooks xslt skripts.
- Creation of fo files using xsltproc and Docbooks xslt skripts.
- Creation of pdf files using fop.
- The conversions needs less than 10 minutes on a Intel i5 processor.
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 20.04.2016 20:41, Simon Riggs wrote:
On 20 April 2016 at 15:30, Jürgen Purtz <juergen@purtz.de> wrote:--What I have done so far is:
- Conversion of sgml files to valid xml syntax with a perl skript. I failed to use 'osx' or 'spam'.
- Conversion of these xml files to Docbook5.x format using xsltproc and Docbooks xslt-migration skripts.
- Creation of html files using xsltproc and Docbooks xslt skripts.
- Creation of fo files using xsltproc and Docbooks xslt skripts.
- Creation of pdf files using fop.
- The conversions needs less than 10 minutes on a Intel i5 processor.
So you believe you have/can convert between the two formats accurately, so we can change things in a single commit?What verification is offered? Possible?And that is ready to go now? Will you post your perl script, or the patch? Other projects use the same file formats, e.g. Slony, XL etcIf an automatic migration is possible do we need to change at all?Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
actually I have done only a first raw round-trip to evaluate that there is no showstopper for my plans. If we find a consensus in the community that this work is valuable for the postgres documentation I will continue to work on it in the near future. To answer your questions:
- "do we need to change at all?". This question has to be discussed in the community. I tried to use the recommended tools like 'osx' and 'spam' - and failed (not at all but in details like newline processing). This may be a my fault, or it results from the fact that we still use sgml instead of xml. But over time this task will get harder and harder: sgml knowledge gets lost, sgml-tools are no longer actively developed, xml move foreward, ...
- Actually I don't see any showstopper. Therefore I believe that the conversion from Docbook 4 to 5 is manageable. The plan is that we will have one xml-file in db5 format per every sgml file in db4 format.
- To support the repository in a continuous way we shall do something like 'git mv file.sgml file.xml', put the new content to 'file.xml' and 'git commit'. Additionally the newlines must be kept during all conversation steps.
- Maybe some very individual (manual) steps are necessary, but it shall be possible that also this can be scripted. Therefore the conversion shall run fast and a single commit shall work on the complete documentation.
- There are no special "Postgres" tasks in the Perl script or at any other places. It depends on docbook only. Therefore other projects can use it in the same way. Of course I will publish all sources.
- Actually I try to generate well-formed xml. Validation against the Docbook 5 schema will follow.
Alexander Law posted additional suggestions and questions:
Hello Jürgen,
Please look at the discussion that we had some time ago:
http://www.postgresql.org/message-id/56337365.2080104@postgrespro.ru
And we (postgrespro) still have plans to migrate to XML as soon as we get documentation translated.
We had no issues with SGML->XML conversion, "make postgres.xml" creates XML (with entities and alike), which we use.
When you talking about "conversion of html, fo, pdf, ..." do you mean using docs/sgml/Makefile or some other scripts?
As to conversion SGML to XML, we need to decide whether to generate a single XML, or a set of XMLs (corresponding to current SGMLs).
In the latter case - how to include XML-fragments into the main document (as entities or with xi:include)?
Please, can you explain what are "Docbooks xslt-migration scripts"?
Is Docbook 4.x incompatible with Docbook 5.x and we need to convert it additionally?
Best regards,
Alexander
-----
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
My answers:
- Docbook 4 and 5 are not compatible. There are new elements, others have gone and are replaced by more generic ones. But the Docbook project offers xslt's to convert Docbook 4 xml-files to Docbook5 xml-files.
- There are pros and cons using postgres.xml as a starting point. PRO: well formed (and valid?) xml format. Entities keeps alive. No more "<![CDATA[", "<![%include" and similar sgml constructs. CON: Only one file. Ugly line break algorithm.
- Actually I don't use the existing Makefile. I start Perl, xsltproc and fop with a different script. If I continue to work, I have to change the Makefile.
- "how to include XML-fragments into the main document (as entities or with xi:include) ?". As described above, I prefer one file per existing sgml-file. But some of those sgml-files have more than one root element. It such situations (and without further processing) the resulting xml-files will have fragments. In general it will be more "Docbook 5 compliant" to use xi:include instead of entities.
- "Docbooks xslt-migration scripts": see: http://docbook.org/docs/howto/#convert4to5
Kind regards
Jürgen Purtz
Please look at the discussion that we had some time ago:
http://www.postgresql.org/message-id/56337365.2080104@postgrespro.ru
And we (postgrespro) still have plans to migrate to XML as soon as we get documentation translated.
We had no issues with SGML->XML conversion, "make postgres.xml" creates XML (with entities and alike), which we use.
When you talking about "conversion of html, fo, pdf, ..." do you mean using docs/sgml/Makefile or some other scripts?
As to conversion SGML to XML, we need to decide whether to generate a single XML, or a set of XMLs (corresponding to current SGMLs).
In the latter case - how to include XML-fragments into the main document (as entities or with xi:include)?
Please, can you explain what are "Docbooks xslt-migration scripts"?
Is Docbook 4.x incompatible with Docbook 5.x and we need to convert it additionally?
Best regards,
Alexander
-----
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
20.04.2016 17:30, Jürgen Purtz пишет:
Hi,
actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an upgrade to DocBook 5.x. This sounds simple, but it will be a long process with many sub-tasks.
Rationale:
- Sooner or later we MUST migrate as the 4.x series is outdated: V4.2 dates back to 2002. The 4.x series is no longer actively developed since 2006. See: http://www.docbook.org/tdg5/en/html/ch01.html "In October 2006, the DocBook Technical Committee released DocBook V4.5, the last release planned in the 4.x series."
- V5.0 is available since 2009. See: http://www.docbook.org/tdg5/en/html/ch01.html: "DocBook V5.0 became an official Committee Specification in June 2009 and became an officia7l OASIS Standard in October 2009."
- Actually the technical committee has the third Candidate Release for V5.1.
PROs:CONs:
- The formal part of the migration is supported by existing tools: http://docbook.org/docs/howto/#convert4to5 (nevertheless some scripts written by ourself will be necessary).
- The normative schema for Docbook 5.x is written in RELAX NG. Additionally the technical committee converts this normative schema to a XSD schema and to DTD, which are not normative but very near to RELAX NG and will fit for most applications. Hence, we have the choice between three schema syntaxes and everybody can use his favourite one.
- Our source file format will switch from SGML to XML. This implies that we have access to all XML features like XLink, XPath, XSLT, XSL-FO, SVG, MathML, namespaces, ... .
What I have done so far is:
- The migration from 4.x to 5.x implies major changes at 3 different levels.
- DocBook structure: Previously it was defined in SGML syntax (DTD). Now it is defined in RELAX NG schema language plus Schematron rules.
- DocBook files: Previously we used SGML syntax for our files. We must convert them to a valid XML syntax, eg: tag omission.
- Tools and style sheets: All tools which operate at the native SGML-level (editors, conversions, ...) must be replaced by XML conforming tools. As valid XML implicitly conforms to a valid SGML syntax this step may be accomplished by reconfiguring some of the tools, eg.: .emacs.
This is a very first raw round-trip with one output file per sgml file and output type. Not supported: entities (__gt__ as a surrogate), <[CDATA and similar SGML constructs, PostgreSQL specific style sheets, Makefile, additional errors occur, .... . I append one file of every new format for the chapter "Advanced Features": xml (the new source), html, fo, pdf.
- Conversion of sgml files to valid xml syntax with a perl skript. I failed to use 'osx' or 'spam'.
- Conversion of these xml files to Docbook5.x format using xsltproc and Docbooks xslt-migration skripts.
- Creation of html files using xsltproc and Docbooks xslt skripts.
- Creation of fo files using xsltproc and Docbooks xslt skripts.
- Creation of pdf files using fop.
- The conversions needs less than 10 minutes on a Intel i5 processor.
Any ideas or suggestions? Shall we go further on this way? Has anybody more experiences in SGML-->XML conversions or Docbook 4.x --> 5.x conversions?
Kind regards
Jürgen Purtz
the conversion of PostgreSQL documentation from Docbook 4.x to 5.x consists of the following steps:
- pure sgml --> xml conversion (done, Perl script)
- 4.x markup --> 5.x markup (done, Docbook standard migration script)
- post-processing of 5.x files (done, Perl: xi:include, entities, ...)
- generate the complete file postgres_all.xml with xmllint (done)
- generate online documentation (html, man, text)
- generate print documentation (rtf, pdf)
- adopt Makefile to the new situation
- a lot of unknown xref targets, as the target exists in a different file
- 4 remaining sgml-entities: standalone-xxx and include-xxx
- some markups, which are not valid in 5.x, mostly with <synopsis> and <function>. This must be resolved manually (5.x offers comprehensive possibilities for very detailed markups with <funcsynopsis> and <cmdsynopsis>)
- Is our file stylesheet.dsl written from scratch - or is it derived from any docbook 1/2/3/4.x generic stylesheet?
- Which person has developed this file?
- What is the role of the *.xsl files in the sgml-directory and how do they collaborate with stylesheet.dsl?
On 20 April 2016 at 15:30, Jürgen Purtz <juergen@purtz.de> wrote:--What I have done so far is:
- Conversion of sgml files to valid xml syntax with a perl skript. I failed to use 'osx' or 'spam'.
- Conversion of these xml files to Docbook5.x format using xsltproc and Docbooks xslt-migration skripts.
- Creation of html files using xsltproc and Docbooks xslt skripts.
- Creation of fo files using xsltproc and Docbooks xslt skripts.
- Creation of pdf files using fop.
- The conversions needs less than 10 minutes on a Intel i5 processor.
So you believe you have/can convert between the two formats accurately, so we can change things in a single commit?What verification is offered? Possible?And that is ready to go now? Will you post your perl script, or the patch? Other projects use the same file formats, e.g. Slony, XL etcIf an automatic migration is possible do we need to change at all?Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Jürgen Purtz wrote: > Hi, > actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an > upgrade to DocBook 5.x. This sounds simple, but it will be a long process > with many sub-tasks. Yes, agreed. The killer objection placed last time was that it took something like 10x longer to generate the HTML using the XML-based toolchain than the SGML-based ones. If this is not fixed, let's forget about this whole thing until it is. So, would you time the process using both toolchains and report back? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 05/03/2016 12:34 PM, Alvaro Herrera wrote: > Jürgen Purtz wrote: >> Hi, >> actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an >> upgrade to DocBook 5.x. This sounds simple, but it will be a long process >> with many sub-tasks. > > Yes, agreed. The killer objection placed last time was that it took > something like 10x longer to generate the HTML using the XML-based > toolchain than the SGML-based ones. If this is not fixed, let's forget > about this whole thing until it is. So, would you time the process > using both toolchains and report back? IIRC: TGL submitted a patch for the openjade bug way back when that caused that issue. TGL, do you know what happened there? Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them.
Jürgen Purtz wrote:
> Hi,
> actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an
> upgrade to DocBook 5.x. This sounds simple, but it will be a long process
> with many sub-tasks.
Yes, agreed. The killer objection placed last time was that it took
something like 10x longer to generate the HTML using the XML-based
toolchain than the SGML-based ones. If this is not fixed, let's forget
about this whole thing until it is. So, would you time the process
using both toolchains and report back?
--
Álvaro Herrera http://www.2ndQuadrant.com/--PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
"Joshua D. Drake" <jd@commandprompt.com> writes: > On 05/03/2016 12:34 PM, Alvaro Herrera wrote: >> Yes, agreed. The killer objection placed last time was that it took >> something like 10x longer to generate the HTML using the XML-based >> toolchain than the SGML-based ones. If this is not fixed, let's forget >> about this whole thing until it is. So, would you time the process >> using both toolchains and report back? > IIRC: > TGL submitted a patch for the openjade bug way back when that caused > that issue. I think you're thinking of this: http://www.postgresql.org/message-id/24388.1166800682@sss.pgh.pa.us I do not recall just when/how that got resolved upstream, or if they ever even responded to me. But it must have been resolved, because the performance before that was patched was untenable even then, and would be far more so now considering how much our docs have grown since 2006. I have not heard anyone complaining lately that PDF output takes three days to build. In short, I doubt that that's relevant anymore. If it was, it would certainly not be favorable to the XML toolchain. BTW, the thread that that message is embedded in is pretty relevant, because it was all about yet another lets-switch-to-XML proposal... regards, tom lane
I wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: >> IIRC: >> TGL submitted a patch for the openjade bug way back when that caused >> that issue. > I think you're thinking of this: > http://www.postgresql.org/message-id/24388.1166800682@sss.pgh.pa.us > I do not recall just when/how that got resolved upstream, or if they > ever even responded to me. But it must have been resolved, because the > performance before that was patched was untenable even then, and would be > far more so now considering how much our docs have grown since 2006. Actually, further digging suggests that Peter found a way to hack our stylesheets to avoid that openjade bug: http://www.postgresql.org/message-id/200612100315.47269.peter_e@gmx.net http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=465269b8a So it's possible that the openjade bug is still there, but has been defanged for our purposes. In any case, there's still little reason to think that it would apply to a different toolchain. regards, tom lane
On 5/3/16 4:13 PM, Oleg Bartunov wrote: > As it stated in > http://www.postgresql.org/message-id/562E061B.1090809@postgrespro.ru > the xml performance may be greatly improved. Alexander, what is current > state of art of your patch ? How slow is xml in compare to sgml ? Please make sure the patch is registered in the next commit fest. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I measured following elapsed times on an Intel i5 processor:
- generate all HTML files with dsl script (make html): 0:48 min.
- generate all HTML files with xslt script (make xslthtml): 16:01 min.
- generate all HTML files with xslt script in the new environment (pure Docbook5): 4:07 min.
- Generating different things via dsl scripts in the new environment may be possible. But the changelog of the Docbook5 dsl scripts shows, that the last modification occurred in 2004 - this way is a dead end.
I used following tools: perl, xmllint and xsltproc. osx and OpenJade are obsolete in the new environment (so far, there is much more work to do).
Jürgen Purtz
On Tue, May 3, 2016 at 10:34 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:Jürgen Purtz wrote:
> Hi,
> actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an
> upgrade to DocBook 5.x. This sounds simple, but it will be a long process
> with many sub-tasks.
Yes, agreed. The killer objection placed last time was that it took
something like 10x longer to generate the HTML using the XML-based
toolchain than the SGML-based ones. If this is not fixed, let's forget
about this whole thing until it is. So, would you time the process
using both toolchains and report back?the xml performance may be greatly improved. Alexander, what is current state of art of your patch ? How slow is xml in compare to sgml ?
--
Álvaro Herrera http://www.2ndQuadrant.com/--PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
=?UTF-8?Q?J=c3=bcrgen_Purtz?= <juergen@purtz.de> writes: > I measured following elapsed times on an Intel i5 processor: > 1. generate all HTML files with dsl script (make html): 0:48 min. > 2. generate all HTML files with xslt script (make xslthtml): 16:01 min. > 3. generate all HTML files with xslt script in the new environment > (pure Docbook5): 4:07 min. > 4. Generating different things via dsl scripts in the new environment > may be possible. But the changelog of the Docbook5 dsl scripts > shows, that the last modification occurred in 2004 - this way is a > dead end. Ouch. What about output to PDF? While we don't care as much about that as HTML for day-to-day use, it has to be feasible (ie, not hours). regards, tom lane
As was stated in the aforementioned thread, solution 2 can be much (8x) faster with some xslt optimizations, but I think now we should outline some roadmap before we start to prepare patches and so.
Maybe we should convert to XML with DocBook4 at first step?
Then, once we get everything stabilized, we can upgrade to DocBook5.
Shouldn't we decompose the conversion procedure, so we could perform fully automatic conversion without any manual changes, and then fix non-valid situations, you described before?
And one more question - Is conversion to DocBook5 your final goal? Or maybe you have any further plans regarding documentation, such as translating it to Deutsch?
Best regards,
Alexander
04.05.2016 17:44, Jürgen Purtz пишет:
Hello,
I measured following elapsed times on an Intel i5 processor:There is one principle and a lot of minor differences between 2 and 3. Solution 2 is based on an xml-file and xslt scripts which are based on Docbook4. The basic difference to 3 is, that in 3 everything is Docbook5 compliant: there are only Docbook5 xml- and xslt-files (as my workflow is: db4 --> xml --> db5 -- (db5 xslt) --> html). The minor differences concerns the fact, that actually there are errors in my xml files and that I made only a few parameterisation to the Docbook5 standard xslt files - no optimization at all.
- generate all HTML files with dsl script (make html): 0:48 min.
- generate all HTML files with xslt script (make xslthtml): 16:01 min.
- generate all HTML files with xslt script in the new environment (pure Docbook5): 4:07 min.
- Generating different things via dsl scripts in the new environment may be possible. But the changelog of the Docbook5 dsl scripts shows, that the last modification occurred in 2004 - this way is a dead end.
I used following tools: perl, xmllint and xsltproc. osx and OpenJade are obsolete in the new environment (so far, there is much more work to do).
Jürgen PurtzOn 03.05.2016 22:13, Oleg Bartunov wrote:On Tue, May 3, 2016 at 10:34 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:Jürgen Purtz wrote:
> Hi,
> actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an
> upgrade to DocBook 5.x. This sounds simple, but it will be a long process
> with many sub-tasks.
Yes, agreed. The killer objection placed last time was that it took
something like 10x longer to generate the HTML using the XML-based
toolchain than the SGML-based ones. If this is not fixed, let's forget
about this whole thing until it is. So, would you time the process
using both toolchains and report back?the xml performance may be greatly improved. Alexander, what is current state of art of your patch ? How slow is xml in compare to sgml ?
--
Álvaro Herrera http://www.2ndQuadrant.com/--PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Jürgen Purtz wrote: > I measured following elapsed times on an Intel i5 processor: > > 1. generate all HTML files with dsl script (make html): 0:48 min. > 2. generate all HTML files with xslt script (make xslthtml): 16:01 min. > 3. generate all HTML files with xslt script in the new environment > (pure Docbook5): 4:07 min. > 4. Generating different things via dsl scripts in the new environment > may be possible. But the changelog of the Docbook5 dsl scripts > shows, that the last modification occurred in 2004 - this way is a > dead end. Thanks. The dsl toolchain has a "make html" format which creates the index and a "make draft" that doesn't. You timed the former only. What's the timing for an equivalent of "make draft" in the xslt chain? If it exists and is short enough, it seems acceptable to me that the complete (with index) build takes ~4x as long as today; the draft timing is more critical, I would think. Man pages are already generated using xslt, so I suppose that wouldn't change. PDF creation timing is also critical. FWIW, in my laptop "make draft" takes 1m18.788s and a "make html" takes 1m26.676s. So it's just 8 seconds to generate the SGML file for the index, and no reruns required ... hmm. I think I'm gonna forget about "make draft" in the future. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alexander Law wrote: > Hello Jürgen, > > As was stated in the aforementioned thread, solution 2 can be much (8x) > faster with some xslt optimizations, but I think now we should outline some > roadmap before we start to prepare patches and so. Can the Docbook5 build be sped up with similar hacks? If the stylesheet tweaks you did are universally useful, why not contribute them back to upstream Docbook? > Maybe we should convert to XML with DocBook4 at first step? > Then, once we get everything stabilized, we can upgrade to DocBook5. Not sure there's much point in having an intermediate step in the repository that makes the doc build so much slower. I'd rather go to Docbook5 straight away. > Shouldn't we decompose the conversion procedure, so we could perform fully > automatic conversion without any manual changes, and then fix non-valid > situations, you described before? I don't think so -- this means leaving a state in the repo in which the docs don't actually build. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 04.05.2016 16:51, Tom Lane wrote: > Ouch. What about output to PDF? While we don't care as much about > that as HTML for day-to-day use, it has to be feasible (ie, not hours). > > regards, tom lane Actually I made tests using fop on single files (the converted sgml files). This works within seconds and in my very first mail from 2016-04-20 I added the results for the 'advanced.xml' file. When I try to convert the complete 'postgres_all.xml' file, fop crashes after some minutes. As fop is a Java application, it is possible that the assigned main memory is short (-Xms -Xmx, ...) - or it comes from some other Java specific issues. I will work on this in the next days. Jürgen Purtz
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > The dsl toolchain has a "make html" format which creates the index and a > "make draft" that doesn't. You timed the former only. What's the > timing for an equivalent of "make draft" in the xslt chain? If it > exists and is short enough, it seems acceptable to me that the complete > (with index) build takes ~4x as long as today; the draft timing is more > critical, I would think. I would object to that; I don't ever use "make draft", in part because I frequently want to look at whether the index entries look sensible. Also, as you noted, the time savings is pretty minimal at present. regards, tom lane
Hello Alvaro, 04.05.2016 18:21, Alvaro Herrera wrote: > Alexander Law wrote: >> Hello Jürgen, >> >> As was stated in the aforementioned thread, solution 2 can be much (8x) >> faster with some xslt optimizations, but I think now we should outline some >> roadmap before we start to prepare patches and so. > Can the Docbook5 build be sped up with similar hacks? > > If the stylesheet tweaks you did are universally useful, why not > contribute them back to upstream Docbook? I can't guarantee that these tweaks with work for all the DocBook documents, though I've made sure that the result is the same for the postgresql doc html's (as I stated in http://www.postgresql.org/message-id/562E061B.1090809@postgrespro.ru). > >> Maybe we should convert to XML with DocBook4 at first step? >> Then, once we get everything stabilized, we can upgrade to DocBook5. > Not sure there's much point in having an intermediate step in the > repository that makes the doc build so much slower. I'd rather go to > Docbook5 straight away. > >> Shouldn't we decompose the conversion procedure, so we could perform fully >> automatic conversion without any manual changes, and then fix non-valid >> situations, you described before? > I don't think so -- this means leaving a state in the repo in which the > docs don't actually build. > I mean we could build the docs just as we do it now (as DocBook4). So we can continue to use existing toolchain (and Makefile as it can generate html and pdf from XML), just change a format for now.
On 04.05.2016 17:08, Alexander Law wrote: > As was stated in the aforementioned thread, solution 2 can be much > (8x) faster with some xslt optimizations, but I think now we should > outline some roadmap before we start to prepare patches and so. > Maybe we should convert to XML with DocBook4 at first step? > Then, once we get everything stabilized, we can upgrade to DocBook5. > Shouldn't we decompose the conversion procedure, so we could perform > fully automatic conversion without any manual changes, and then fix > non-valid situations, you described before? Hello Alexander, I havn't seen your xslt optimization so far. What have you done? Where can I find the optimized script or a description? "Divide and conquer" is a good strategy and people use it in many cases. As you have stated, there are two major steps: from db4-sgml to db4-xml and from there to db5-xml. In parallel to the second one we shall migrate from dsl scripts to db5-xslt scripts. Your idea to go step by step and stabilise at the intermediate level is good in general. But in this case it may be unnecessary. The first step is very small. It consists mainly of the elimination of shorttags and empty elements. This is a pure formal act without risk. If we would stop at this point, people are forced to switch their environment, eg .emacs from db4-sgml to db4-xml - and after the second step to db5-xml. This is possible - but the twice changing will bring (possibly) more confusion than advantages. The real challenge is the second step as it implies some manual modifications (entities, non-valid markup in sense of db5-schema) and a switch to a different output chain. Maybe we can live for a while with some files, which are not valid against db5-schema - as far as the output chain produces correct results. Jürgen Purtz
Jürgen Purtz wrote: > The real challenge is the second > step as it implies some manual modifications (entities, non-valid markup in > sense of db5-schema) and a switch to a different output chain. Maybe we can > live for a while with some files, which are not valid against db5-schema - > as far as the output chain produces correct results. Speaking of entities, I noticed you changed some entities such as — to __mdash__ and such. That's not acceptable. What is Docbook5's accepted way to enter such characters? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
04.05.2016 19:52, Jürgen Purtz пишет: > On 04.05.2016 17:08, Alexander Law wrote: >> As was stated in the aforementioned thread, solution 2 can be much >> (8x) faster with some xslt optimizations, but I think now we should >> outline some roadmap before we start to prepare patches and so. >> Maybe we should convert to XML with DocBook4 at first step? >> Then, once we get everything stabilized, we can upgrade to DocBook5. >> Shouldn't we decompose the conversion procedure, so we could perform >> fully automatic conversion without any manual changes, and then fix >> non-valid situations, you described before? > > Hello Alexander, > > I havn't seen your xslt optimization so far. What have you done? Where > can I find the optimized script or a description? It was attached to the message: http://www.postgresql.org/message-id/562E061B.1090809@postgrespro.ru > "Divide and conquer" is a good strategy and people use it in many > cases. As you have stated, there are two major steps: from db4-sgml to > db4-xml and from there to db5-xml. In parallel to the second one we > shall migrate from dsl scripts to db5-xslt scripts. Your idea to go > step by step and stabilise at the intermediate level is good in > general. But in this case it may be unnecessary. The first step is > very small. It consists mainly of the elimination of shorttags and > empty elements. This is a pure formal act without risk. If we would > stop at this point, people are forced to switch their environment, eg > .emacs from db4-sgml to db4-xml - and after the second step to > db5-xml. This is possible - but the twice changing will bring > (possibly) more confusion than advantages. The real challenge is the > second step as it implies some manual modifications (entities, > non-valid markup in sense of db5-schema) and a switch to a different > output chain. Maybe we can live for a while with some files, which are > not valid against db5-schema - as far as the output chain produces > correct results. By first step I mean SGML->XML conversion (while staying with DocBook4). It's not small, IMHO, as it involves all the doc/sgml contents replacing (with doc/xml or alike) and updating makefile (replacing 'html' target with 'xslthtml' and so on). Though it can (and should) be performed as one commit with using one conversion script. The advantage of "baby steps" for me is an opportunity to play safe. In fact it's the question of balance between amount of redundant work and manageability of the conversion. IMO, amount of redundant work is not so large as: 1) we should convert and sgml->xml anyway 2) we can use existing makefile's targets 3) we should update documentation on documentation ([1], [2]) anyway. [1] http://www.postgresql.org/docs/9.5/interactive/docguide-build.html [2] http://www.postgresql.org/docs/9.5/interactive/docguide-authoring.html Alexander Lakhin
Hello Alvaro, yes, character entities respectively their values must be kept (what you have seen is an intermediate state). We will use utf-8, so every possible Unicode code point can be used directly. But we use not only character entities, there are also parameter entities and external entities. The external entities will be replaced by xi:XInclude. At last there are 4 parameter entities, to whom I actually have no solution: %standalone-ignore; %standalone-include; %include-index; %include-xslt-index; . But they should not be a show-stopper. Jürgen Purtz On 04.05.2016 19:12, Alvaro Herrera wrote: > Jürgen Purtz wrote: > >> The real challenge is the second >> step as it implies some manual modifications (entities, non-valid markup in >> sense of db5-schema) and a switch to a different output chain. Maybe we can >> live for a while with some files, which are not valid against db5-schema - >> as far as the output chain produces correct results. > Speaking of entities, I noticed you changed some entities such as > — to __mdash__ and such. That's not acceptable. What is > Docbook5's accepted way to enter such characters? >
Alvaro, the advantage of draft mode is smaller than 15% in the three environments. In Docbook5 it reduces the elapsed time from 4:07 to 4:02. Jürgen Purtz On 04.05.2016 17:18, Alvaro Herrera wrote: > The dsl toolchain has a "make html" format which creates the index and a > "make draft" that doesn't. You timed the former only. What's the > timing for an equivalent of "make draft" in the xslt chain? If it > exists and is short enough, it seems acceptable to me that the complete > (with index) build takes ~4x as long as today; the draft timing is more > critical, I would think.
Jürgen Purtz wrote: > Hello Alvaro, > > yes, character entities respectively their values must be kept (what you > have seen is an intermediate state). We will use utf-8, so every possible > Unicode code point can be used directly. But we use not only character > entities, there are also parameter entities and external entities. The > external entities will be replaced by xi:XInclude. OK. Currently we have no non-ASCII chars in our source code, so this would be without precedent, but I think all modern tools should cope. The only pain point may be Tom Lane's mail client, which is unique in still using us-ascii encoding. > At last there are 4 parameter entities, to whom I actually have no > solution: %standalone-ignore; %standalone-include; %include-index; > %include-xslt-index; . But they should not be a show-stopper. Hmm? I think we use these entities to generate text files that are distributed in the tarball. How would we generate these files in the Docbook5 XML world? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 5/4/16 11:08 AM, Alexander Law wrote: > As was stated in the aforementioned thread, solution 2 can be much (8x) > faster with some xslt optimizations, but I think now we should outline > some roadmap before we start to prepare patches and so. > Maybe we should convert to XML with DocBook4 at first step? > Then, once we get everything stabilized, we can upgrade to DocBook5. > Shouldn't we decompose the conversion procedure, so we could perform > fully automatic conversion without any manual changes, and then fix > non-valid situations, you described before? I think the process should be something like this: - Apply your XSLT performance patch. The patch should be submitted to the next commit fest. - Wait a while to make sure everyone is happy with the performance. Keep tweaking if necessary. - Port all DSSSL customizations to XSLT. Manually evaluate output for quality. - Switch to XSLT build for official HTML documentation. [milestone 1] - Convert sources to XML. (There could be substeps here.) [milestone 2] - Then consider upgrading to DocBook 5. [milestone 3] -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Done (previous patch cleaned). This patch optimizes XSL transformations contained in docbook-xsl (1.78.1). Tested with 9.5.2 time make html real 1m21.989s user 1m21.392s sys 0m0.484s 1) time make xslthtml (before the patch) real 29m19.904s user 29m18.804s sys 0m0.888s 2) time make xslthtml (after the patch) real 3m8.483s user 3m7.556s sys 0m0.864s To make sure that the result of the transformation is the same: After 1): mv html html.xslt1 After 2): mv html html.xslt2 for f in *.xslt*/*.html; do sed -e 's/id=\"\(ftn\.\)\?id[a-z][0-9]\+\"/id=\"id\"/g' -i $f ; sed -e 's/href=\"[^#]*#\(ftn\.\)\?id[a-z][0-9]\+\"/href=\"#\"/g' -i $f; done diff -u -r html.xslt1 html.xslt2 Best regards, Alexander 04.05.2016 03:44, Peter Eisentraut пишет: > On 5/3/16 4:13 PM, Oleg Bartunov wrote: >> As it stated in >> http://www.postgresql.org/message-id/562E061B.1090809@postgrespro.ru >> the xml performance may be greatly improved. Alexander, what is current >> state of art of your patch ? How slow is xml in compare to sgml ? > > Please make sure the patch is registered in the next commit fest. >
Attachment
Hello Peter, I fully support your plan and going to move this way. Best regards, Alexander 05.05.2016 04:09, Peter Eisentraut пишет: > On 5/4/16 11:08 AM, Alexander Law wrote: >> As was stated in the aforementioned thread, solution 2 can be much (8x) >> faster with some xslt optimizations, but I think now we should outline >> some roadmap before we start to prepare patches and so. >> Maybe we should convert to XML with DocBook4 at first step? >> Then, once we get everything stabilized, we can upgrade to DocBook5. >> Shouldn't we decompose the conversion procedure, so we could perform >> fully automatic conversion without any manual changes, and then fix >> non-valid situations, you described before? > > I think the process should be something like this: > > - Apply your XSLT performance patch. The patch should be submitted to > the next commit fest. > > - Wait a while to make sure everyone is happy with the performance. > Keep tweaking if necessary. > > - Port all DSSSL customizations to XSLT. Manually evaluate output for > quality. > > - Switch to XSLT build for official HTML documentation. [milestone 1] > > - Convert sources to XML. (There could be substeps here.) [milestone 2] > > - Then consider upgrading to DocBook 5. [milestone 3] >
Hi, after some manual interventions (which must become part of an algorithm) it's possible to created the complete PDF file for the db4 production chain: *.sgml -- (perl) --> *.xml and after that step: postgres.xml -- (evaluate XInclude with xmllint) --> postgres_all.xml -- (xsltproc using standard db4 stylesheet) --> postgres_all.fo -- (fop) --> postgres_all.pdf. The fo/pdf generation takes about 1 minute. Actually the layout of the resulting pdf file differs from the original one as I used only standard scripts without any adoption to the PostgreSQL styles. Additionally I'm missing some of the links. Jürgen Purtz On 04.05.2016 17:30, Jürgen Purtz wrote: > On 04.05.2016 16:51, Tom Lane wrote: >> Ouch. What about output to PDF? While we don't care as much about >> that as HTML for day-to-day use, it has to be feasible (ie, not hours). >> >> regards, tom lane > > Actually I made tests using fop on single files (the converted sgml > files). This works within seconds and in my very first mail from > 2016-04-20 I added the results for the 'advanced.xml' file. When I try > to convert the complete 'postgres_all.xml' file, fop crashes after > some minutes. As fop is a Java application, it is possible that the > assigned main memory is short (-Xms -Xmx, ...) - or it comes from some > other Java specific issues. I will work on this in the next days. > > Jürgen Purtz >
Jürgen Purtz wrote: > after some manual interventions (which must become part of an algorithm) > it's possible to created the complete PDF file for the db4 production chain: > *.sgml -- (perl) --> *.xml and after that step: postgres.xml -- (evaluate > XInclude with xmllint) --> postgres_all.xml -- (xsltproc using standard db4 > stylesheet) --> postgres_all.fo -- (fop) --> postgres_all.pdf. The fo/pdf > generation takes about 1 minute. Uh? So generating the PDF is much quicker than generating the HTML? That's strange, but if true, then I'm happy about it. > Actually the layout of the resulting pdf file differs from the original one > as I used only standard scripts without any adoption to the PostgreSQL > styles. Surely this can be sorted out. > Additionally I'm missing some of the links. Not sure what you mean here. Did you figure a way to generate the INSTALL file? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Is everybody happy with the performance patch? Is anybody working on the DSSSL --> XSLT conversion? We should avoid parallel work. Jürgen Purtz On 05.05.2016 03:09, Peter Eisentraut wrote: > On 5/4/16 11:08 AM, Alexander Law wrote: >> As was stated in the aforementioned thread, solution 2 can be much (8x) >> faster with some xslt optimizations, but I think now we should outline >> some roadmap before we start to prepare patches and so. >> Maybe we should convert to XML with DocBook4 at first step? >> Then, once we get everything stabilized, we can upgrade to DocBook5. >> Shouldn't we decompose the conversion procedure, so we could perform >> fully automatic conversion without any manual changes, and then fix >> non-valid situations, you described before? > > I think the process should be something like this: > > - Apply your XSLT performance patch. The patch should be submitted to > the next commit fest. > > - Wait a while to make sure everyone is happy with the performance. > Keep tweaking if necessary. > > - Port all DSSSL customizations to XSLT. Manually evaluate output for > quality. > > - Switch to XSLT build for official HTML documentation. [milestone 1] > > - Convert sources to XML. (There could be substeps here.) [milestone 2] > > - Then consider upgrading to DocBook 5. [milestone 3] >
On 5/13/16 3:38 AM, Jürgen Purtz wrote: > Is everybody happy with the performance patch? > Is anybody working on the DSSSL --> XSLT conversion? We should avoid > parallel work. The performance patch is in the next commit fest for review. The DSSSL -> XSLT conversion basically just needs someone to go through the existing code and output and identify necessary improvements. Most people right now are working on getting the current release out, so I don't expect much to happen here for a while. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Jürgen, We started to work on the conversion: http://www.postgresql.org/message-id/56337365.2080104@postgrespro.ru And now we (at PostgresPro) are going to continue this work. I think we should do it together to avoid redundant work. Best regards, Alexander 13.05.2016 21:01, Peter Eisentraut пишет: > On 5/13/16 3:38 AM, Jürgen Purtz wrote: >> Is everybody happy with the performance patch? >> Is anybody working on the DSSSL --> XSLT conversion? We should avoid >> parallel work. > > The performance patch is in the next commit fest for review. > > The DSSSL -> XSLT conversion basically just needs someone to go > through the existing code and output and identify necessary improvements. > > Most people right now are working on getting the current release out, > so I don't expect much to happen here for a while. >
We maintain two css files: docs.css (for online, complex) and stylesheet.css (local, simple). Why do we need stylesheet.css? Because of the references to some png-files in docs.css? Because of Lynx? For of easier comparisons? Kind regards, Jürgen Purtz On 13.05.2016 20:01, Peter Eisentraut wrote: > On 5/13/16 3:38 AM, Jürgen Purtz wrote: >> Is everybody happy with the performance patch? >> Is anybody working on the DSSSL --> XSLT conversion? We should avoid >> parallel work. > > The performance patch is in the next commit fest for review. > > The DSSSL -> XSLT conversion basically just needs someone to go > through the existing code and output and identify necessary improvements. > > Most people right now are working on getting the current release out, > so I don't expect much to happen here for a while. >
On 5/22/16 3:48 PM, Jürgen Purtz wrote: > We maintain two css files: docs.css (for online, complex) and > stylesheet.css (local, simple). Why do we need stylesheet.css? So you can read the documentation locally without referencing online files. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 5/5/16 12:45 AM, Alexander Law wrote: > Done (previous patch cleaned). > This patch optimizes XSL transformations contained in docbook-xsl (1.78.1). I have looked through this patch, and it's awesome. I have tweaked it a bit more along the lines you guys have started, and now the build time is pretty much the same as with DSSSL. Attached is my final patch, which I plan to commit as soon as the new branch opens. (I only have Alexander Lakhin as credit right now. Please let me know if anyone else contributed.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hello Peter, Thanks for improvements! I checked outputs (with the attached script) and found that there is small difference with the improved patch. For example, look at xtypes.html: - <link rel="home" href="index.html" title="PostgreSQL 9.6beta1 Documentation" /><link rel="up" href="extend.html" title="Chapter 35. Extending SQL" /><link rel="prev" href="xaggr.html" title="35.10. User-defined Aggregates" /><link rel="next" href="xoper.html" title="35.12. User-defined Operators" /><link rel="copyright" href="legalnotice.html" title="Legal Notice" /></head> + <link rel="prev" href="xaggr.html" title="35.10. User-defined Aggregates" /><link rel="next" href="xoper.html" title="35.12. User-defined Operators" /></head> It caused by <xsl:template name="html.head">. Leaving aside the question whether the links "home", "up" and "copyright" are needed, maybe it's better to split the commit to two? First to speed up the conversion while making sure that the output is the same, and the second to change the html.head output format. Best regards, Alexander 03.06.2016 22:31, Peter Eisentraut пишет: > On 5/5/16 12:45 AM, Alexander Law wrote: >> Done (previous patch cleaned). >> This patch optimizes XSL transformations contained in docbook-xsl >> (1.78.1). > > I have looked through this patch, and it's awesome. I have tweaked it > a bit more along the lines you guys have started, and now the build > time is pretty much the same as with DSSSL. Attached is my final > patch, which I plan to commit as soon as the new branch opens. > > (I only have Alexander Lakhin as credit right now. Please let me know > if anyone else contributed.) >
Attachment
On 6/4/16 10:28 AM, Alexander Law wrote: > It caused by <xsl:template name="html.head">. > Leaving aside the question whether the links "home", "up" and > "copyright" are needed, maybe it's better to split the commit to two? > First to speed up the conversion while making sure that the output is > the same, and the second to change the html.head output format. I did that intentionally, but I agree that it might be better to split this off into a separate commit. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Peter, In that case can we postpone the second commit until step 3 in your plan of migration to XML is done? I mean "- Port all DSSSL customizations to XSLT. Manually evaluate output for quality. "; If we will not change contents/formatting until migration to xslt is done, we can ensure that output is the same by automatic means. (See my letter: https://www.postgresql.org/message-id/56337365.2080104%40postgrespro.ru) Best regards, Alexander 06.06.2016 15:28, Peter Eisentraut пишет: > On 6/4/16 10:28 AM, Alexander Law wrote: >> It caused by <xsl:template name="html.head">. >> Leaving aside the question whether the links "home", "up" and >> "copyright" are needed, maybe it's better to split the commit to two? >> First to speed up the conversion while making sure that the output is >> the same, and the second to change the html.head output format. > > I did that intentionally, but I agree that it might be better to split > this off into a separate commit. >
On 6/6/16 1:49 PM, Alexander Law wrote: > In that case can we postpone the second commit until step 3 in your plan > of migration to XML is done? > I mean "- Port all DSSSL customizations to XSLT. Manually evaluate > output for quality. "; It is part of the performance work. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I think the process should be something like this:
- Apply your XSLT performance patch. The patch should be submitted to the next commit fest.
- Wait a while to make sure everyone is happy with the performance. Keep tweaking if necessary.
- Port all DSSSL customizations to XSLT. Manually evaluate output for quality.
- Switch to XSLT build for official HTML documentation. [milestone 1]
- Convert sources to XML. (There could be substeps here.) [milestone 2]
- Then consider upgrading to DocBook 5. [milestone 3]
Alexander and I continue to work on this path. In the meanwhile we have reached a state where xml files are well formed and valid against docbook 4 dtd - each single file as well as the big postgres_all.xml file. Thanks to Alexander's performance patch all XSLT processes run very fast (the slowest is fo+pdf with 6:30 min).
On this basis I actually work on the HTML generation. But in opposite to the previous steps (where we create identical copies of the sgml files) the new css file is very different from the old one. This results from the following:
- The XSLT process generates other HTML elements and other classes in comparison to the dsssl process.
- XML files are case sensitive. All object names (id, ulink, linkend, zone, ...) are now lower case.
- Sometimes the order of elements changed.
- As the previous css file was constructed (some years ago) from three different css files, he contains redundant and sometimes contradictory information. I did a complete review.
Jürgen Purtz
I have some progress with the "Port all DSSSL customizations to XSLT" step. I still think that we should avoid manual comparison of old and new outputs when we can align them and have all differences observable, countable and manageable.
I had developed XSLT's that allows us to do it. Please look at the patch (with all the XSLT's) and the comparison script attached.
(There are a several dozens of changes in XSLTs and some in the script. Some differences still remain but they are observable and can be eliminated too)
Btw, can I hope to get some feedback on my previous letter regarding error fixes (https://www.postgresql.org/message-id/5736C475.3030405%40gmail.com)?
Best regards,
Alexander
19.06.2016 23:22, Jürgen Purtz пишет:
On 05.05.2016 03:09, Peter Eisentraut wrote:I think the process should be something like this:
- Apply your XSLT performance patch. The patch should be submitted to the next commit fest.
- Wait a while to make sure everyone is happy with the performance. Keep tweaking if necessary.
- Port all DSSSL customizations to XSLT. Manually evaluate output for quality.
- Switch to XSLT build for official HTML documentation. [milestone 1]
- Convert sources to XML. (There could be substeps here.) [milestone 2]
- Then consider upgrading to DocBook 5. [milestone 3]
Alexander and I continue to work on this path. In the meanwhile we have reached a state where xml files are well formed and valid against docbook 4 dtd - each single file as well as the big postgres_all.xml file. Thanks to Alexander's performance patch all XSLT processes run very fast (the slowest is fo+pdf with 6:30 min).
On this basis I actually work on the HTML generation. But in opposite to the previous steps (where we create identical copies of the sgml files) the new css file is very different from the old one. This results from the following:To get a feedback from the community I have published the resulting postgres_all.html and its pgdoc_online.css file. Please refer to https://github.com/JuergenPurtz/pgdoc_db5/blob/master/postgresql-9.5.3/doc/src/db4_xml/postgres_all.html respective pgdoc_online.css to get the files. Please compare the html file with pages you are familiar with. And remember: the look-and-feel is similar, but far from identical.
- The XSLT process generates other HTML elements and other classes in comparison to the dsssl process.
- XML files are case sensitive. All object names (id, ulink, linkend, zone, ...) are now lower case.
- Sometimes the order of elements changed.
- As the previous css file was constructed (some years ago) from three different css files, he contains redundant and sometimes contradictory information. I did a complete review.
Jürgen Purtz
Attachment
I think the process should be something like this:
- Apply your XSLT performance patch. The patch should be submitted to the next commit fest.
- Wait a while to make sure everyone is happy with the performance. Keep tweaking if necessary.
- Port all DSSSL customizations to XSLT. Manually evaluate output for quality.
- Switch to XSLT build for official HTML documentation. [milestone 1]
- Convert sources to XML. (There could be substeps here.) [milestone 2]
- Then consider upgrading to DocBook 5. [milestone 3]
During the step from DocBook 4 to DocBook 5 [M3] we will face the problem that there are incompatibilities in the DocBook structure. Our source is affected by:
- It's no longer possible to use <option> and <optional> in a recursive fashion.
- The content model of <literal>, <function>, <command> and similar elements has changed.
-----
Hi,
I'm investigating this issue. I'm trying to track down the history and reasoning for this change, but have not yet completed that research. It looks like a mistake to me to limit the content model so much when we were consciously trying to maintain backwards compatibility unless there were good reasons not to.
Bob Stayton
Sagehill Enterprises
bobs@sagehill.net
-----
Jürgen Purtz
Btw: Since 16 June 2016 Norman Welsh is no longer chairman of the TC. Now Bob holds this position, Norman changed to a 'normal' member of the TC.
On 6/6/16 8:28 AM, Peter Eisentraut wrote: > On 6/4/16 10:28 AM, Alexander Law wrote: >> It caused by <xsl:template name="html.head">. >> Leaving aside the question whether the links "home", "up" and >> "copyright" are needed, maybe it's better to split the commit to two? >> First to speed up the conversion while making sure that the output is >> the same, and the second to change the html.head output format. > > I did that intentionally, but I agree that it might be better to split > this off into a separate commit. I have committed the first part of this, as discussed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Great! The next items in our plan were: - Wait a while to make sure everyone is happy with the performance. Keep tweaking if necessary. - Port all DSSSL customizations to XSLT. Manually evaluate output for quality. Should we now compare DSSSL outputs with XSLT? I had some success with it before. See my letter: https://www.postgresql.org/message-id/57712848.7060306%40gmail.com Those xslt's (see xhtml-like-dsssl.patch) can help us to see all the differences and to decide which customizations to keep. Best regards, Alexander 18.08.2016 20:56, Peter Eisentraut пишет: > On 6/6/16 8:28 AM, Peter Eisentraut wrote: >> On 6/4/16 10:28 AM, Alexander Law wrote: >>> It caused by <xsl:template name="html.head">. >>> Leaving aside the question whether the links "home", "up" and >>> "copyright" are needed, maybe it's better to split the commit to two? >>> First to speed up the conversion while making sure that the output is >>> the same, and the second to change the html.head output format. >> I did that intentionally, but I agree that it might be better to split >> this off into a separate commit. > I have committed the first part of this, as discussed. >
On 8/19/16 9:14 AM, Alexander Law wrote: > The next items in our plan were: An immediate problem is that the patched stuff no longer works with older stylesheets (1.76.1?). I'm glad to leave older stuff behind for a 400% speedup, but we need to analyze the exact effect and possibly document it or work around it. > - Wait a while to make sure everyone is happy with the performance. Keep > tweaking if necessary. > - Port all DSSSL customizations to XSLT. Manually evaluate output for > quality. > > Should we now compare DSSSL outputs with XSLT? > I had some success with it before. See my letter: > https://www.postgresql.org/message-id/57712848.7060306%40gmail.com > Those xslt's (see xhtml-like-dsssl.patch) can help us to see all the > differences and to decide which customizations to keep. It looks like the idea there is to whack the XSLT stylesheets until the output looks exactly like the DSSSL output? I'm not sure that's terribly useful. It would probably be a lot of work, which we'll just end up removing eventually. I'd rather just fix any formatting issues we find and move forward. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello, >> The next items in our plan were: > An immediate problem is that the patched stuff no longer works with > older stylesheets (1.76.1?). I'm glad to leave older stuff behind for a > 400% speedup, but we need to analyze the exact effect and possibly > document it or work around it. Please consider committing attached patch. Commented-out call-template does nothing so we can just customize our customized templates further to support 1.76. I've just performed the build with Ubuntu 13.04/docbook 1.76.1 - it works. (In case the call-template generated something it would appear in bookindex.html only.) > >> - Wait a while to make sure everyone is happy with the performance. Keep >> tweaking if necessary. >> - Port all DSSSL customizations to XSLT. Manually evaluate output for >> quality. >> >> Should we now compare DSSSL outputs with XSLT? >> I had some success with it before. See my letter: >> https://www.postgresql.org/message-id/57712848.7060306%40gmail.com >> Those xslt's (see xhtml-like-dsssl.patch) can help us to see all the >> differences and to decide which customizations to keep. > It looks like the idea there is to whack the XSLT stylesheets until the > output looks exactly like the DSSSL output? I'm not sure that's > terribly useful. It would probably be a lot of work, which we'll just > end up removing eventually. I'd rather just fix any formatting issues > we find and move forward. That work is done already and it's results are countable and observable differences. (See comments in the xslt.) For example, with DSSSL we don't get a chapter TOC when the chapter contains only one sect1 (with XSLT we get the TOC with the one item). We also had subtoc for sect1/refentry and sect1/simplesect, but with XSLT it's absent. So if all such differences are not important, let's move forward. Best regards, Alexander
Attachment
On 8/23/16 10:23 AM, Alexander Law wrote: >> An immediate problem is that the patched stuff no longer works with >> > older stylesheets (1.76.1?). I'm glad to leave older stuff behind for a >> > 400% speedup, but we need to analyze the exact effect and possibly >> > document it or work around it. > Please consider committing attached patch. Commented-out call-template > does nothing so we can just customize our customized templates further > to support 1.76. pushed, thanks -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello, Peter. >>> Should we now compare DSSSL outputs with XSLT? >>> I had some success with it before. See my letter: >>> https://www.postgresql.org/message-id/57712848.7060306%40gmail.com >>> Those xslt's (see xhtml-like-dsssl.patch) can help us to see all the >>> differences and to decide which customizations to keep. >> It looks like the idea there is to whack the XSLT stylesheets until the >> output looks exactly like the DSSSL output? I'm not sure that's >> terribly useful. It would probably be a lot of work, which we'll just >> end up removing eventually. I'd rather just fix any formatting issues >> we find and move forward. > That work is done already and it's results are countable and > observable differences. (See comments in the xslt.) > For example, with DSSSL we don't get a chapter TOC when the chapter > contains only one sect1 (with XSLT we get the TOC with the one item). > We also had subtoc for sect1/refentry and sect1/simplesect, but with > XSLT it's absent. > So if all such differences are not important, let's move forward. > Please look at the http://oc.postgrespro.ru/index.php/s/ttJyMDLr8Xr1HTu/download where I have gathered together all the significant differences, that we have between DSSSL and XSLT outputs. I have marked red the differences that I would consider as negative. Let's decide which ones are acceptable and which we need to eliminate. Best regards, Alexander
On 14.09.2016 13:18, Alexander Law wrote: > Hello, Peter. >>>> Should we now compare DSSSL outputs with XSLT? >>>> I had some success with it before. See my letter: >>>> https://www.postgresql.org/message-id/57712848.7060306%40gmail.com >>>> Those xslt's (see xhtml-like-dsssl.patch) can help us to see all the >>>> differences and to decide which customizations to keep. >>> It looks like the idea there is to whack the XSLT stylesheets until the >>> output looks exactly like the DSSSL output? I'm not sure that's >>> terribly useful. It would probably be a lot of work, which we'll just >>> end up removing eventually. I'd rather just fix any formatting issues >>> we find and move forward. >> That work is done already and it's results are countable and >> observable differences. (See comments in the xslt.) >> For example, with DSSSL we don't get a chapter TOC when the chapter >> contains only one sect1 (with XSLT we get the TOC with the one item). >> We also had subtoc for sect1/refentry and sect1/simplesect, but with >> XSLT it's absent. >> So if all such differences are not important, let's move forward. >> > Please look at the > http://oc.postgrespro.ru/index.php/s/ttJyMDLr8Xr1HTu/download > where I have gathered together all the significant differences, that > we have between DSSSL and XSLT outputs. > I have marked red the differences that I would consider as negative. > Let's decide which ones are acceptable and which we need to eliminate. > > Best regards, > Alexander > > > Hello Alexander, great job! In my opinion most of the differences are not only acceptable but even better. Here are some addition notes: For me the following topics are ok: 18, 19, 22, 37. If possible we shall invest some more effort in solutions for: 1 (up + home not only in footer but also in header), 8, 11, 20, 30. Topics 9 and 22 seems to be identical. Kind regards, Jürgen
Hello Jürgen, 14.09.2016 16:05, Jürgen Purtz wrote: > For me the following topics are ok: 18, 19, 22, 37. The problem with 37 is that such flat numbering present only in refentry. Other sections (sect1, sect2) have independent numbering. So you can't find "Table 236", but you can find "Table 27.2. Collected Statistics Views" in monitoring-stats.html, for example. So it's inconsistent at least. > If possible we shall invest some more effort in solutions for: 1 (up + > home not only in footer but also in header), 8, 11, 20, 30. What is marked "+" in the "XSLT alignment exists" column is already solved. I mean that I've developed the XSLT templates that eliminate the indicated differences. I presented a patch with all the templates before (for 9.5) and adapted it for the master branch now. > Topics 9 and 22 seems to be identical. Yes, that difference required changes in two places and I duplicated the topic erroneously. Best regards, Alexander
Jürgen Purtz wrote: > Hello Alexander, > > great job! > > In my opinion most of the differences are not only acceptable but even > better. Agreed. But there are a few that merit a fix. There are a few that look a matter of style (XSL output looks odd), another few are not important. But I think missing TOC for example is a problem. 1 is a customization we went great lengths to add, so it's a must fix I think. I think 12, 14, 15 merit more research too. What's up with 30? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Alvaro, 14.09.2016 18:30, Alvaro Herrera wrote: > What's up with 30? ecpg.sgml contains "<remark>The scope of the allocated descriptor is WHAT?.</remark>", which is printed with XSLT but ignored with DSSSL. Maybe we can just remove such remarks. I found only two of them, other one in dml.sgml: <chapter id="dml"> <title>Data Manipulation</title> <remark> This chapter is still quite incomplete. </remark> Best regards, Alexander
Alexander Law <exclusion@gmail.com> writes: > Hello Alvaro, > 14.09.2016 18:30, Alvaro Herrera wrote: >> What's up with 30? > ecpg.sgml contains "<remark>The scope of the allocated descriptor is > WHAT?.</remark>", which is printed with XSLT but ignored with DSSSL. > Maybe we can just remove such remarks. I found only two of them, other > one in dml.sgml: > <chapter id="dml"> > <title>Data Manipulation</title> > <remark> > This chapter is still quite incomplete. > </remark> Change them to SGML comments? Although actually the DML one should be removed, I think it's ancient and obsolete. regards, tom lane
Alexander Law wrote: > 14.09.2016 18:30, Alvaro Herrera wrote: > >What's up with 30? > > ecpg.sgml contains "<remark>The scope of the allocated descriptor is > WHAT?.</remark>", which is printed with XSLT but ignored with DSSSL. > > Maybe we can just remove such remarks. I added Michael on CC. Maybe he can clarify what the scope of the descriptor is, so that we can add some proper sentence, and remove the <remark> tag. It's strange that the xslt prints the <remark> text, when the historical behavior was to ignore it. Maybe OASIS changed their mind as to what it meant. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
14.09.2016 19:41, Alvaro Herrera wrote: > Alexander Law wrote: > >> 14.09.2016 18:30, Alvaro Herrera wrote: >>> What's up with 30? >> ecpg.sgml contains "<remark>The scope of the allocated descriptor is >> WHAT?.</remark>", which is printed with XSLT but ignored with DSSSL. >> >> Maybe we can just remove such remarks. > I added Michael on CC. Maybe he can clarify what the scope of the > descriptor is, so that we can add some proper sentence, and remove the > <remark> tag. > > > It's strange that the xslt prints the <remark> text, when the historical > behavior was to ignore it. Maybe OASIS changed their mind as to what it > meant. > In fact it's controlled by show.comments variable in XSLT, which is defined in doc/src/sgml/stylesheet-common.xsl as: <xsl:param name="show.comments"> <xsl:choose> <xsl:when test="contains($pg.version, 'devel')">1</xsl:when> <xsl:otherwise>0</xsl:otherwise> </xsl:choose> </xsl:param> And it's just inconsistent with the DSSSL definition ((define %show-comments% draft-mode)). So we can just ignore the difference. Though question that is still open since 2003 could be answered at the end of the day.
Hello, I've modified XSL's to eliminate most undesired differences between `make html` and `make xslthtml` outputs. The patch attached. See http://oc.postgrespro.ru/index.php/s/Gj2PGZ9IHUbDC5t/download (eliminated differences marked cyan). What should we do next to finish with the "Port all DSSSL customizations to XSLT" item in our plan? Best regards, Alexander 14.09.2016 18:30, Alvaro Herrera wrote: > I think 12, 14, 15 merit more research too. > > What's up with 30? >
Attachment
The Docbook TC is discussing the mid-term plans for Docbook 6, see: https://lists.oasis-open.org/archives/docbook/201610/msg00005.html . Just like at Docbook 5, the normative standard will be written in RelaxNG+Schematron. In the past they generated dtd and xsd files out of RelaxNG. But for Docbook 6 they consider to drop the xsd version - the dtd will survive as it can be generated automatically. * Does anybody need the xsd version in the long term? * xmllint is able to validate against RelaxNG. Kind regards, Jürgen
On 9/14/16 7:18 AM, Alexander Law wrote: > Please look at the > http://oc.postgrespro.ru/index.php/s/ttJyMDLr8Xr1HTu/download > where I have gathered together all the significant differences, that we > have between DSSSL and XSLT outputs. > I have marked red the differences that I would consider as negative. > Let's decide which ones are acceptable and which we need to eliminate. Thank you for making that list. I have a similar list that is not quite the same, so together we'll probably find all the problems. I have checked both lists and most of the issues are not terribly critical. I have now committed fixes for what I think were the major missing usability issues: header customization and index letter links. I would be comfortable with switching the default build to XSLT now and work out the remaining issues on the fly. What do you think? (I also have a similar list for switching the PDF build from jadetex to fop. We are also in pretty good shape there, but I have not finished the evaluation fully.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > I have now committed fixes for what I think were the major missing > usability issues: header customization and index letter links. I would > be comfortable with switching the default build to XSLT now and work out > the remaining issues on the fly. What do you think? What does that imply in terms of changing build dependencies for people who want to build the docs? regards, tom lane
On 11/8/16 10:02 AM, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> I have now committed fixes for what I think were the major missing >> usability issues: header customization and index letter links. I would >> be comfortable with switching the default build to XSLT now and work out >> the remaining issues on the fly. What do you think? > > What does that imply in terms of changing build dependencies for people > who want to build the docs? I will write a full explanation to hackers before making any change. But the short answer is, it's the same tools we use for building the man pages now, so if you can run make man or make world, you're set. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Peter, I think it's time to switch to XSLT. We at Postgres Pro already build (converted to XML) and publish the docs (for version 9.6, in English and Russian) and we've got only one bug report related to the header/navigation differences. (See https://postgrespro.com/docs/postgresql/9.6/tutorial-concepts.html vs https://postgrespro.com/docs/postgresql/9.5/tutorial-concepts.html). There are other customizations that I would like to apply, but it could be done later. And as we should completely move away from DSSSL to migrate to XML, I think the sooner we switch to XSLT, the better. Best regards, Alexander 08.11.2016 16:25, Peter Eisentraut wrote: > I have now committed fixes for what I think were the major missing > usability issues: header customization and index letter links. I would > be comfortable with switching the default build to XSLT now and work out > the remaining issues on the fly. What do you think? >
Hello Peter, I saw that you committed the patch to switch the html build to XSLT by default. So It seems, now we can continue the move to XML. I'd suggest to move in several steps. Please see the attached scripts. The main script is 7_check_conversion.sh. It performs all the conversion and checks whether the html output is the same. I suggest to split conversion in three commits. Commit#0 is for manual corrections - it replaces "<" with "<" and so on in some sgml's and it doesn't affect the build or outputs. It needed just for the next step - automatic conversion. These changes to sgml's are countable and observable. Commit#1 performs conversion of all SGML's to make them compatible with XML (as much as possible). (Thanks to Jurgen for his sgml2xml.pl script.) These changes to sgml's are massive, but they are produced automatically, so we need just to check the script and make sure that the output is the same. After that commit we still can use SGML build. And the last commit, commit#2 is for switching to XML. At that point doc/src/sgml renamed to doc/src/xml, build environment modified and cleaned, but changes in sgml/xml are minimal, so we can observe and check them. After the commit#2 we get all our docs in XML (DocBook 4.2) and can build it just as we did with 'make html/man/...' before. Maybe the commit#2 should be applied later, but commits #0 and #1 are not intrusive and can be applied anytime. Best regards, Alexander
Attachment
> Hello Peter, > > I saw that you committed the patch to switch the html build to XSLT by > default. > So It seems, now we can continue the move to XML. > I'd suggest to move in several steps. > Please see the attached scripts. > The main script is 7_check_conversion.sh. It performs all the > conversion and checks whether the html output is the same. > I suggest to split conversion in three commits. > Commit#0 is for manual corrections - it replaces "<" with "<" and > so on in some sgml's and it doesn't affect the build or outputs. > It needed just for the next step - automatic conversion. These changes > to sgml's are countable and observable. > > Commit#1 performs conversion of all SGML's to make them compatible > with XML (as much as possible). (Thanks to Jurgen for his sgml2xml.pl > script.) > These changes to sgml's are massive, but they are produced > automatically, so we need just to check the script and make sure that > the output is the same. > After that commit we still can use SGML build. > > And the last commit, commit#2 is for switching to XML. At that point > doc/src/sgml renamed to doc/src/xml, build environment modified and > cleaned, but changes in sgml/xml are minimal, so we can observe and > check them. > > After the commit#2 we get all our docs in XML (DocBook 4.2) and can > build it just as we did with 'make html/man/...' before. > Maybe the commit#2 should be applied later, but commits #0 and #1 are > not intrusive and can be applied anytime. > > Best regards, > Alexander > Hello, I greatly welcome the next steps toward XML. In addition to the submitted scripts I want to point out that we will get validation error messages if the following three things coincide: a) Docbook 4.2, b) use of XInclude, c) a different directory (eg: 'ref'). The xi:base attribute (to specify a different directory) was introduced into the Docbook DTD with version 4.3. Older DTDs cannot use it without manual changes to the DTD, please see: http://www.sagehill.net/docbookxsl/ValidXinclude.html. To overcome this shortage I suggest the use of Docbook 4.3 (or 4.5) - as a separate commit or implicitly with commit #2. As far as I have seen, all converted documents validate against 4.5. Kind regards Jürgen Purtz
Hello Peter, 16.11.2016 14:30, Alexander Law wrote: > So It seems, now we can continue the move to XML. > I'd suggest to move in several steps. > Please see the attached scripts. > The main script is 7_check_conversion.sh. It performs all the > conversion and checks whether the html output is the same. > I suggest to split conversion in three commits. > Commit#0 is for manual corrections - it replaces "<" with "<" and > so on in some sgml's and it doesn't affect the build or outputs. > It needed just for the next step - automatic conversion. These changes > to sgml's are countable and observable. > > Commit#1 performs conversion of all SGML's to make them compatible > with XML (as much as possible). (Thanks to Jurgen for his sgml2xml.pl > script.) > These changes to sgml's are massive, but they are produced > automatically, so we need just to check the script and make sure that > the output is the same. > After that commit we still can use SGML build. > > And the last commit, commit#2 is for switching to XML. At that point > doc/src/sgml renamed to doc/src/xml, build environment modified and > cleaned, but changes in sgml/xml are minimal, so we can observe and > check them. > > After the commit#2 we get all our docs in XML (DocBook 4.2) and can > build it just as we did with 'make html/man/...' before. > Maybe the commit#2 should be applied later, but commits #0 and #1 are > not intrusive and can be applied anytime. I've rebased previous patches for the current "10devel" version. Will we continue move to DocBook.XML? Are there any obstacles that may keep us from moving forward? Best regards, Alexander
Attachment
Hello Alexander, On 28.02.2017 09:55, Alexander Law wrote: > Hello Peter, > > 16.11.2016 14:30, Alexander Law wrote: >> So It seems, now we can continue the move to XML. >> I'd suggest to move in several steps. >> Please see the attached scripts. >> The main script is 7_check_conversion.sh. It performs all the >> conversion and checks whether the html output is the same. >> I suggest to split conversion in three commits. >> Commit#0 is for manual corrections - it replaces "<" with "<" and >> so on in some sgml's and it doesn't affect the build or outputs. >> It needed just for the next step - automatic conversion. These >> changes to sgml's are countable and observable. >> >> Commit#1 performs conversion of all SGML's to make them compatible >> with XML (as much as possible). (Thanks to Jurgen for his sgml2xml.pl >> script.) >> These changes to sgml's are massive, but they are produced >> automatically, so we need just to check the script and make sure that >> the output is the same. >> After that commit we still can use SGML build. >> >> And the last commit, commit#2 is for switching to XML. At that point >> doc/src/sgml renamed to doc/src/xml, build environment modified and >> cleaned, but changes in sgml/xml are minimal, so we can observe and >> check them. >> >> After the commit#2 we get all our docs in XML (DocBook 4.2) and can >> build it just as we did with 'make html/man/...' before. >> Maybe the commit#2 should be applied later, but commits #0 and #1 are >> not intrusive and can be applied anytime. > I've rebased previous patches for the current "10devel" version. > Will we continue move to DocBook.XML? > Are there any obstacles that may keep us from moving forward? > > Best regards, > Alexander > the time gap between commit#1 and commit#2 shall be small as people may create - in accordance with SGML - additional empty elements and shorttags. The attached version of sgml2xml.pl is cleaned up by elimination of unused variables and some modifications in the comments. Kind regards, Jürgen
Attachment
On 2/28/17 03:55, Alexander Law wrote: > I've rebased previous patches for the current "10devel" version. > Will we continue move to DocBook.XML? > Are there any obstacles that may keep us from moving forward? We still haven't gotten rid of all the DSSSL use. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2/28/17 03:55, Alexander Law wrote: > I've rebased previous patches for the current "10devel" version. > Will we continue move to DocBook.XML? I'm moving this to the next commit fest. The conversion from SGML to XML will be a theme for the PG11 development cycle. For PG10, we have accomplished the conversion from DSSSL to XSLT. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2/28/17 03:55, Alexander Law wrote: > I've rebased previous patches for the current "10devel" version. > Will we continue move to DocBook.XML? > Are there any obstacles that may keep us from moving forward? I have started working through these patches now. I have committed the escaping of < and & and will work through the rest slowly, to minimize disruptions to other development. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello, 06.09.2017 18:54, Peter Eisentraut wrote: > > I have started working through these patches now. I have committed the > escaping of < and & and will work through the rest slowly, to minimize > disruptions to other development. Great! I have rebased all the remaining patches and updated scripts for the current master (see attachment). Best regards, Alexander -- Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-docs
Attachment
On Sat, Sep 9, 2017 at 12:30 AM, Alexander Lakhin <exclusion@gmail.com> wrote: > Hello, > > 06.09.2017 18:54, Peter Eisentraut wrote: >> >> >> I have started working through these patches now. I have committed the >> escaping of < and & and will work through the rest slowly, to minimize >> disruptions to other development. > > Great! > > I have rebased all the remaining patches and updated scripts for the current > master (see attachment). Hi Alexander, In future versions of this patch set, if there is a dependency between the patches would you mind indicating the order to apply them, perhaps with prefixes like "0001-"? That would be quite useful for humans who aren't yet familiar enough with your patch set to guess the order, and also for stupid patch testing robots: == Fetched patches from message ID f00bf53f-e6b5-a033-69be-0c63878f0d30%40gmail.com == Applying on top of commit 3c435952176ae5d294b37e5963cd72ddb66edead == Applying patches from tarball pg-doc.check.tar.bz2... == Applying patch pg-doc.check/patches/sgml-xml/ecpg.patch... == Applying patch pg-doc.check/patches/sgml-xml/func.patch... == Applying patch pg-doc.check/patches/sgml-xml/generate-errcodes-table.pl.patch... == Applying patch pg-doc.check/patches/sgml-xml/pgtesttiming.patch... == Applying patch pg-doc.check/patches/sgml-xml/release-10.patch... 1 out of 5 hunks FAILED -- saving rejects to file doc/src/sgml/release-10.sgml.rej Thanks! -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-docs
On 9/8/17 08:30, Alexander Lakhin wrote: >> I have started working through these patches now. I have committed the >> escaping of < and & and will work through the rest slowly, to minimize >> disruptions to other development. > Great! > > I have rebased all the remaining patches and updated scripts for the > current master (see attachment). So, I've been looking at this profiling stuff, to replace the marked sections in the installation instructions. I found the overhead of that a bit too much for building the full documentation, so I have come up with the attached alternative solution. What do you think? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-docs
Attachment
On 9/8/17 08:30, Alexander Lakhin wrote:I have started working through these patches now. I have committed the escaping of < and & and will work through the rest slowly, to minimize disruptions to other development.Great! I have rebased all the remaining patches and updated scripts for the current master (see attachment).So, I've been looking at this profiling stuff, to replace the marked sections in the installation instructions. I found the overhead of that a bit too much for building the full documentation, so I have come up with the attached alternative solution. What do you think?
I'm not happy with the 'particular conversions'-part of 'standalone-profile.xsl'. It applies subsequent modifications, which are in not very intuitive to a reader, eg:
<xsl:template match="phrase[@id='install-ldap-links']">
<xsl:text>the documentation about client authentication and libpq</xsl:text>
</xsl:template>
This approach spreads the intended text over two very different files (in this example: 'installation.xml' and 'standalone-profile.xsl').
My suggestion is to keep the source code in one file in the same manner as with the SGML standalone-include/standalone-ignore mechanism. A generic xsl file shall create the extended output similar to 'standalone-profile.xsl'.
installation.xml:
support for authentication and connection parameter lookup (see
<phrase condition="standalone">the documentation about client authentication and libpq</phrase>
<phrase condition="default"><xref linkend="libpq-ldap"/> and <xref linkend="auth-ldap"/></phrase>
for more information). On Unix,
...
collectAll.xsl (similar to standalone-profile.xsl): <!-- parameters and variables -->
<xsl:param name="pg.Standalone" select="'default'"/>
<!-- <xsl:param name="pg.Standalone" select="'standalone'"/> -->
<!-- Process all nodes -->
<xsl:template match="*|@*|text()|processing-instruction()|comment()">
<xsl:choose>
<xsl:when test="(not (@condition) or @condition=$pg.Standalone )">
<!-- copy nodes without a 'condition' attribute and such nodes, where 'condition' meets the given criteria -->
<xsl:copy>
<xsl:apply-templates select="*|@*|text()|processing-instruction()|comment()"/>
</xsl:copy>
</xsl:when>
</xsl:choose>
</xsl:template>
I'm sorry that I actually cannot deliver a patch because I'm abroad and have limited resources (but many challenges). But I hope that the idea gets clear. The attached collectAll.xsl file contains a more complex solution for the case that we have to deal with more than one include/ignore type, eg: index-generating.
Attachment
On 9/15/17 14:54, Jürgen Purtz wrote: > My suggestion is to keep the source code in one file in the same manner > as with the SGML standalone-include/standalone-ignore mechanism. A > *generic* xsl file shall create the extended output similar to > 'standalone-profile.xsl'. > > > installation.xml: > > support for authentication and connection parameter lookup (see > <phrase condition="standalone">the documentation about client > authentication and libpq</phrase> > <phrase condition="default"><xref linkend="libpq-ldap"/> and<xref > linkend="auth-ldap"/></phrase> > for more information). On Unix, > ... That is what the standard DocBook profiling system does. But that has the disadvantage of imposing a performance penalty on building the documentation. Unless you have a way around that that I'm not seeing. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-docs
15.09.2017 21:54, Jürgen Purtz wrote:
Peter, can you show what performance drop you see with the profiling (e.g. for HTML)?On 15.09.2017 19:32, Peter Eisentraut wrote:On 9/8/17 08:30, Alexander Lakhin wrote:I have started working through these patches now. I have committed the escaping of < and & and will work through the rest slowly, to minimize disruptions to other development.Great! I have rebased all the remaining patches and updated scripts for the current master (see attachment).So, I've been looking at this profiling stuff, to replace the marked sections in the installation instructions. I found the overhead of that a bit too much for building the full documentation, so I have come up with the attached alternative solution. What do you think?
I get the following numbers:
1. make html with profiling (import profile-chunk.xsl in stylesheet.xsl):
85.98user 0.76system 1:29.85elapsed
2. make html without profiling (import chunk.xsl in stylesheet.xsl):
77.36user 0.62system 1:21.28elapsed
3. Separate profiling (performed before making epub, as dbtoepub doesn't support profiling)
8.52user 0.22system 0:10.31elapsed
So I get ~10% performance drop when making html. Are you concerned about the same overhead?
I would choose some standard way to have separate content in the same file, but if the overhead is not acceptable, and we're not going to extend the profiling usage, then we need to invent something that will complicate XML-related processing (I think about translation but the other issues are possible too).
Jürgen, this approach implemented by applying profiling.xsl in Makefile (for make postgres.epub). (See Makefile in https://www.postgresql.org/message-id/attachment/54854/pg-doc.check.tar.bz2)I'm not happy with the 'particular conversions'-part of 'standalone-profile.xsl'. It applies subsequent modifications, which are in not very intuitive to a reader, eg:
<xsl:template match="phrase[@id='install-ldap-links']">
<xsl:text>the documentation about client authentication and libpq</xsl:text>
</xsl:template>This approach spreads the intended text over two very different files (in this example: 'installation.xml' and 'standalone-profile.xsl').
My suggestion is to keep the source code in one file in the same manner as with the SGML standalone-include/standalone-ignore mechanism. A generic xsl file shall create the extended output similar to 'standalone-profile.xsl'.
Best regards,
Alexander
On 9/18/17 10:38, Alexander Lakhin wrote: > Peter, can you show what performance drop you see with the profiling > (e.g. for HTML)? > I get the following numbers: > 1. make html with profiling (import profile-chunk.xsl in stylesheet.xsl): > 85.98user 0.76system 1:29.85elapsed > > 2. make html without profiling (import chunk.xsl in stylesheet.xsl): > 77.36user 0.62system 1:21.28elapsed > > 3. Separate profiling (performed before making epub, as dbtoepub doesn't > support profiling) > 8.52user 0.22system 0:10.31elapsed > > So I get ~10% performance drop when making html. Are you concerned about > the same overhead? Yeah, that's about what I see. > I would choose some standard way to have separate content in the same > file, but if the overhead is not acceptable, and we're not going to > extend the profiling usage, then we need to invent something that will > complicate XML-related processing (I think about translation but the > other issues are possible too). It's only for the INSTALL file and won't get used anywhere else, so I think it's OK to have a bit of an ad-hoc system. > Jürgen, this approach implemented by applying profiling.xsl in Makefile > (for make postgres.epub). (See Makefile in > https://www.postgresql.org/message-id/attachment/54854/pg-doc.check.tar.bz2) I hadn't even looked at that yet, but that kind of supports my point. Injecting the profiling layer everywhere is going to be annoying. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-docs
Hello Peter, 19.09.2017 23:05, Peter Eisentraut wrote: >> I would choose some standard way to have separate content in the same >> file, but if the overhead is not acceptable, and we're not going to >> extend the profiling usage, then we need to invent something that will >> complicate XML-related processing (I think about translation but the >> other issues are possible too). > It's only for the INSTALL file and won't get used anywhere else, so I > think it's OK to have a bit of an ad-hoc system. Well, then I would suggest to place all the extra content in the installation-single.xsl file and to add all the alternate text to installation.xml. I would like to place such alternate text in an attribute "standalonetext" or alike, but DocBook 4.x doesn't allow for extra attributes, so we need to choose from: http://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes So I decided to make alternative use of xreflabel attribute. (I believe that we could find something more appropriate after migrating from Docbook 4.2 to 5.x).) Please see patches/xml/installation.patch in the attachment (or installation.xml in doc/src/xml after ) for an example. Makefile (in patches/xml/) is adjusted for the new approach too. (Main output is slightly changed after switching from "profile-chunk.xsl" to "chunk.xsl", I'll fix it later if we choose this way.) (Archive in the attachment is packed twice to avoid the automatic patch checking...) Best regards, Alexander -- Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-docs
Attachment
On 9/20/17 09:00, Alexander Lakhin wrote: > I would like to place such alternate text in an attribute > "standalonetext" or alike, but DocBook 4.x doesn't allow for extra > attributes, so we need to choose from: > http://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes > So I decided to make alternative use of xreflabel attribute. (I believe > that we could find something more appropriate after migrating from > Docbook 4.2 to 5.x).) Yeah, but that's a bit of a hack isn't it? I also don't see anything in DocBook 5 that indicates to me that we could get rid of that hack then. In the interest of moving things along, I have committed my patch and will continue working on the rest of the patch set. Improvements are welcome and can be submitted separately, but I think it's hardly worth it because this stuff changes so rarely. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-docs
Hello, 27.09.2017 18:38, Peter Eisentraut wrote: > On 9/20/17 09:00, Alexander Lakhin wrote: >> I would like to place such alternate text in an attribute >> "standalonetext" or alike, but DocBook 4.x doesn't allow for extra >> attributes, so we need to choose from: >> http://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes >> So I decided to make alternative use of xreflabel attribute. (I believe >> that we could find something more appropriate after migrating from >> Docbook 4.2 to 5.x).) > Yeah, but that's a bit of a hack isn't it? I also don't see anything in > DocBook 5 that indicates to me that we could get rid of that hack then. I found a more appropriate way for using an extra attribute: http://www.sagehill.net/docbookxsl/AddProfileAtt.htm I've tested it with "standalone" attribute and it works (for Docbook 4.2). May be it will decrease hackiness level of the solution? If it's not too late I could prepare a new patchset today. > In the interest of moving things along, I have committed my patch and > will continue working on the rest of the patch set. > > Improvements are welcome and can be submitted separately, but I think > it's hardly worth it because this stuff changes so rarely. But with such approach we should translate that .xsl too and it seems rather strange, isn't it? ------ Alexander Lakhin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-docs
27.09.2017 19:11, Alexander Lakhin wrote: >> In the interest of moving things along, I have committed my patch and >> will continue working on the rest of the patch set. >> >> Improvements are welcome and can be submitted separately, but I think >> it's hardly worth it because this stuff changes so rarely. > But with such approach we should translate that .xsl too and it seems > rather strange, isn't it? Improved patch attached. -- Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-docs
Attachment
16.11.2017 00:06, Peter Eisentraut wrote:
Great!Here is the final patch set for the conversion.
I have some questions.
1. Can you share the exact scripts you use to generate 0002?
(We transform the documentation to xml since 9.6 on-fly so it would be nice to adjust our scripts for previous versions too.)
2. Will you rename *.sgml to *.xml and move src/sgml to src/xml?
3. And I still think that hard-coding references and replacements in the standalone-profile is not good. Are you going to apply less hackish approach?
(See https://www.postgresql.org/message-id/b4fd6932-8d44-3210-48ab-bbc393bf9a23%40gmail.com , patches/xml/installation.patch and patches/xml/installation-single.xsl)
4. BTW, when making postgres-A4.pdf, I get
[WARN] FOUserAgent - Destination: Unresolved ID reference "ecpg-type-timestamp-date" found.
It can be fixed by moving the id from the title tag to sect4:
- <sect4>
- <title id="ecpg-type-timestamp-date">timestamp, date</title>
+ <sect4 id="ecpg-type-timestamp-date">
+ <title>timestamp, date</title>
I wonder, what were the reasons to set id for the title tag?
Best regards,
------
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 11/16/17 01:00, Alexander Lakhin wrote: > 1. Can you share the exact scripts you use to generate 0002? I used the script you (or Jürgen) provided. > 2. Will you rename *.sgml to *.xml and move src/sgml to src/xml? Not right now, but we should have that discussion at some point over the next few months. > 3. And I still think that hard-coding references and replacements in the > standalone-profile is not good. Are you going to apply less hackish > approach? > (See > https://www.postgresql.org/message-id/b4fd6932-8d44-3210-48ab-bbc393bf9a23%40gmail.com > , patches/xml/installation.patch and patches/xml/installation-single.xsl) The patches you send me are very confusing. If you can send me a separate patch of what you have in mind and why and so on, we can consider it separately, but right now I don't have much interest in revisiting this. What is committed works well, stays out of the way, and causes no problems AFAICT. > 4. BTW, when making postgres-A4.pdf, I get > [WARN] FOUserAgent - Destination: Unresolved ID reference > "ecpg-type-timestamp-date" found. > It can be fixed by moving the id from the title tag to sect4: > - <sect4> > - <title id="ecpg-type-timestamp-date">timestamp, date</title> > + <sect4 id="ecpg-type-timestamp-date"> > + <title>timestamp, date</title> > I wonder, what were the reasons to set id for the title tag? In some cases the ids on the title tags were necessary for the DSSSL stylesheets. They can be removed now with some care. I have some partial patches for that. These particular warnings seem harmless. I have found the following reference: <https://lists.oasis-open.org/archives/docbook-apps/201510/msg00052.html> But I see the same warnings in the PG10 build, so this does not appear to be a regression from the XML conversion. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-docs
On 11/15/17 16:06, Peter Eisentraut wrote: > Here is the final patch set for the conversion. I have committed this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
23.11.2017 17:53, Peter Eisentraut wrote:
On 11/15/17 16:06, Peter Eisentraut wrote:Great! Thank you for your work! And in light of possible need to convert to xml older branches too, maybe we should simplify INSTALL now.Here is the final patch set for the conversion.I have committed this.
Please, consider applying the attached patch. It produces the same INSTALL and is much better in the following aspects.
1. All the INSTALL content is placed in two files (installation.sgml and installation-single.xsl) instead of three (installation.sgml, standalone-install.xml, standalone-profile.xsl).
2. There are no unreadable and untranslatable (in context) constructions such as
<xsl:template match="xref[@linkend='plpython-python23']">
<xsl:text>the </xsl:text><application>PL/Python</application><xsl:text> documentation</xsl:text>
</xsl:template>
(Sometimes translators need to replace larger fragments.)
3. It uses only XSLT (which we use already), no xi:include.
4. It doesn't generate complete postgres.sgml to process only the installation section.
I understand that it will take some time to review it, but I think it's justified by the portability and supportability reasons.
Best regards,
------
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment
On 11/25/17 00:50, Alexander Lakhin wrote: > Please, consider applying the attached patch. It produces the same > INSTALL and is much better in the following aspects. > > 1. All the INSTALL content is placed in two files (installation.sgml and > installation-single.xsl) instead of three (installation.sgml, > standalone-install.xml, standalone-profile.xsl). > 2. There are no unreadable and untranslatable (in context) constructions > such as > <xsl:template match="xref[@linkend='plpython-python23']"> > <xsl:text>the > </xsl:text><application>PL/Python</application><xsl:text> > documentation</xsl:text> > </xsl:template> > (Sometimes translators need to replace larger fragments.) > 3. It uses only XSLT (which we use already), no xi:include. > 4. It doesn't generate complete postgres.sgml to process only the > installation section. What is the standalonetext attribute? I don't see that in the DTD. Wouldn't that fail validation? Also, the makefile fragments need to be written differently. You can't use pipes; that would not catch any errors. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Peter, 28.11.2017 19:02, Peter Eisentraut wrote: > > What is the standalonetext attribute? I don't see that in the DTD. > Wouldn't that fail validation? I've added it to postgres.sgml as allowed by DocBook standard: http://www.sagehill.net/docbookxsl/AddProfileAtt.html +<!ENTITY % local.effectivity.attrib "standalonetext CDATA #IMPLIED"> > Also, the makefile fragments need to be written differently. You can't > use pipes; that would not catch any errors. I believe, that the last xsltproc will fail in case of error in previous command(s) as it will not get valid xml. For example, if sed failed with error, I get: -:1: parser error : Document is empty Best regards, ------ Alexander Lakhin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company