Re: Moving documentation to XML - Mailing list pgsql-docs
From | Alexander Lakhin |
---|---|
Subject | Re: Moving documentation to XML |
Date | |
Msg-id | 56337365.2080104@postgrespro.ru Whole thread Raw |
In response to | Moving documentation to XML (Luzanov Pavel <PLuzanov@postgrespro.ru>) |
Responses |
Re: Moving documentation to XML
|
List | pgsql-docs |
Hello, Guillaume. We have plans to use this for russian translation, too. We translate the docs by converting (with xml2po) the single xml to postgres-ru.po and after translating it we convert it back to xml (we get postres-ru.xml here). (Until now we had to perform one more conversion (postgres-ru.xml -> set of sgml's).) So now we can get russian html/* with: python xml2po.py -l ru -k -p postgres-ru.po postgres.xml >postgres-ru.xml xsltproc --stringparam pg.version '9.4.1' stylesheet.xsl postgres-ru.xml But I had some doubts about DSSSL and XSL differences. As I noted previously there was at least one visible difference. So I decided to customize XSL templates to make sure that html's are generated without a loss or corruption. I thought that comparing two HTML sources will not work, as they are too different, but maybe we can compare text generated from html by lynx, for example. So I use the following procedure to look for differences: 0. Get dsssl-generated html's make html 1. Extract text content from html's: for f in html/*.html; do fn=`basename $f`; echo $fn; cat $f | perl -0pi -pe 's/<B\s*>Note:\s*<\/B\s*>/\<h3>Note<\/h3>/g' | perl -0pi -pe 's/><BLOCKQUOTE\s*CLASS="NOTE"/><div/ig' >/tmp/$fn; lynx /tmp/$fn --dump >html-text/$fn; * Some differences are not significant so it's not reasonable to modify XSL templates to eliminate them. Difference in "Note" placement and spelling is one of them, so I just filter it out. 2. Rename html to html-o and html-text to html-o-text. 3. Generate html's with XSL (use modified templates): rm -r html; xsltproc --stringparam pg.version '9.4.1' stylesheet.xsl postgres.xml 4. Extract text content from html's as above. 5. Make sure that two text html's are identical: diff -s -u -b -I '^\s*_\+\s*$' html-o-text/xtypes.html html-text/xtypes.html * Differences in whitespaces and length of "____" lines are not significant, too. For now, I've managed to get the same xtypes.html (I tested my XSL customizations with it), but I think, we can eliminate other most outstanding (or maybe all) differences likewise. I can describe XSL customizations in more details, if needed. Best regards, Alexander P.S. I couldn't post the message as a reply due to error on the postgresql.org side. (<pgsql-docs@postgresql.org>: host makus.postgresql.org[174.143.35.229] said: 550 Message headers fail syntax check (in reply to end of DATA command)) 28.10.2015 14:46, Guillaume Lelarge wrote: > > Le 26 oct. 2015 6:40 PM, "Alexander Lakhin" <a.lakhin@postgrespro.ru> > a écrit : > > > ... > > To make sure that result of the transformation is the same, I've > compared original .html's with .html's generated with modified templates. > > Unfortunately xslt generates random id's, so it's needed to exclude > them before comparing. I do that with: > > for f in */*.html; do sed -e > 's/id=\"\(ftn\.\)\?id[a-z][0-9]\+\"/id=\"id\"/g' -i $f ; sed -e > 's/href=\"[^#]*#\(ftn\.\)\?id[a-z][0-9]\+\"/href=\"#\"/g' -i $f; done > > > > > > So if it's acceptable way to speed up generation of HTML (and maybe > some other formats), what other steps should we take to move away from > SGML? > > If the performance is still not satisfying, please let me know, I'll > continue to optimize xslt. > > Beside performance issues, I can see some difference in results of > 'make html' and 'make xslthtml'. For example, see > doc/src/sgml/html/spi.html (xslt-generated version doesn't contain the > lists of functions). > > > > What you've done is awesome. I can't wait to test it on the french > translation. > > Nice work! >
Attachment
pgsql-docs by date: