Thread: Optimizing the documentation
-hackers,
The community has spent a lot of time optimizing features over the years. Excellent examples include parallel query and partitioning which have been multi-year efforts to increase the quality, performance, and extend features of the original commit. We should consider the documentation in a similar manner. Just like code, documentation can sometimes use a bug fix, optimization, and/or new features added to the original implementation.
Technical documentation should only be as verbose as needed to illustrate the concept or task that we are explaining. It should not be redundant, nor should it use .50 cent words when a .10 cent word would suffice. I would like to put effort into optimizing the documentation and am requesting general consensus that this would be a worthwhile effort before I begin to dust off my Docbook skills.
I have provided an example below:
Original text (79 words):
This book is the official documentation of PostgreSQL. It has been written by the PostgreSQL developers and other volunteers in parallel to the development of the PostgreSQL software. It describes all the functionality that the current version of PostgreSQL officially supports.
To make the large amount of information about PostgreSQL manageable, this book has been organized in several parts. Each part is targeted at a different class of users, or at users in different stages of their PostgreSQL experience:
Optimized text (35 words):
This is the official PostgreSQL documentation. It is written by the PostgreSQL community in parallel with the development of the software. We have organized it by the type of user and their stages of experience:
Issues that are resolved with the optimized text:
Succinct text is more likely to be read than skimmed
Removal of extraneous mentions of PostgreSQL
Removal of unneeded justifications
Joining of two paragraphs into one that provides only the needed information to the user
Word count decreased by over 50%. As changes such as these are adopted it would make the documentation more consumable.
-hackers,
The community has spent a lot of time optimizing features over the years. Excellent examples include parallel query and partitioning which have been multi-year efforts to increase the quality, performance, and extend features of the original commit. We should consider the documentation in a similar manner. Just like code, documentation can sometimes use a bug fix, optimization, and/or new features added to the original implementation.
Technical documentation should only be as verbose as needed to illustrate the concept or task that we are explaining. It should not be redundant, nor should it use .50 cent words when a .10 cent word would suffice. I would like to put effort into optimizing the documentation and am requesting general consensus that this would be a worthwhile effort before I begin to dust off my Docbook skills.
I have provided an example below:
Original text (79 words):
This book is the official documentation of PostgreSQL. It has been written by the PostgreSQL developers and other volunteers in parallel to the development of the PostgreSQL software. It describes all the functionality that the current version of PostgreSQL officially supports.
To make the large amount of information about PostgreSQL manageable, this book has been organized in several parts. Each part is targeted at a different class of users, or at users in different stages of their PostgreSQL experience:
Optimized text (35 words):
This is the official PostgreSQL documentation. It is written by the PostgreSQL community in parallel with the development of the software. We have organized it by the type of user and their stages of experience:
Issues that are resolved with the optimized text:
Succinct text is more likely to be read than skimmed
Removal of extraneous mentions of PostgreSQL
Removal of unneeded justifications
Joining of two paragraphs into one that provides only the needed information to the user
Word count decreased by over 50%. As changes such as these are adopted it would make the documentation more consumable.
On 14/12/2020 21:50, Joshua Drake wrote: > The community has spent a lot of time optimizing features over the > years. Excellent examples include parallel query and partitioning which > have been multi-year efforts to increase the quality, performance, and > extend features of the original commit. We should consider the > documentation in a similar manner. Just like code, documentation can > sometimes use a bug fix, optimization, and/or new features added to the > original implementation. > > Technical documentation should only be as verbose as needed to > illustrate the concept or task that we are explaining. It should not be > redundant, nor should it use .50 cent words when a .10 cent word would > suffice. I would like to put effort into optimizing the documentation > and am requesting general consensus that this would be a worthwhile > effort before I begin to dust off my Docbook skills. Hard to argue with "let's make the doc better" :-). I expect that there will be a lot of bikeshedding over the exact phrases. That's OK. Every improvement that actually gets committed helps, even if we don't make progress on other parts. > I have provided an example below: > > > Original text (79 words): > > > This book is the official documentation of PostgreSQL. It has been > written by the PostgreSQL developers and other volunteers in parallel to > the development of the PostgreSQL software. It describes all the > functionality that the current version of PostgreSQL officially supports. > > To make the large amount of information about PostgreSQL manageable, > this book has been organized in several parts. Each part is targeted at > a different class of users, or at users in different stages of their > PostgreSQL experience: > > Optimized text (35 words): > > > This is the official PostgreSQL documentation. It is written by the > PostgreSQL community in parallel with the development of the software. > We have organized it by the type of user and their stages of experience: Some thoughts on this example: - Changing "has been" to "is" changes the tone here. "Is" implies that it is being written continuously, whereas "has been" implies that it's finished. We do update the docs continuously, but point of the sentence is that the docs were developed together with the features, so "has been" seems more accurate. ´- I like "PostgreSQL developers and other volunteers" better than the "PostgreSQL community". This is the very first introduction to PostgreSQL, so we can't expect the reader to know what the "PostgreSQL community" is. I like the "volunteers" word here a lot. - I think a little bit of ceremony is actually OK in this particular paragraph, since it's the very first one in the docs. - I agree with dropping the "to make the large amount of information manageable". So I would largely keep this example unchanged, changing it into: --- This book is the official documentation of PostgreSQL. It has been written by the PostgreSQL developers and other volunteers in parallel to the development of the PostgreSQL software. It describes all the functionality that the current version of PostgreSQL officially supports. This book has been organized in several parts. Each part is targeted at a different class of users, or at users in different stages of their PostgreSQL experience: --- > Issues that are resolved with the optimized text: > > * Succinct text is more likely to be read than skimmed > > * Removal of extraneous mentions of PostgreSQL > > * Removal of unneeded justifications > > * Joining of two paragraphs into one that provides only the needed > information to the user > > * Word count decreased by over 50%. As changes such as these are > adopted it would make the documentation more consumable. I agree with these goals in general. I like to refer to http://www.plainenglish.co.uk/how-to-write-in-plain-english.html when writing documentation. Or anything else, really. - Heikki
Technical documentation should only be as verbose as needed to illustrate the concept or task that we are explaining. It should not be redundant, nor should it use .50 cent words when a .10 cent word would suffice. I would like to put effort into optimizing the documentation and am requesting general consensus that this would be a worthwhile effort before I begin to dust off my Docbook skills.
As a quick observation, it would be more immediately helpful to add to the existing proposal to add more details about architecture and get that committed before embarking on a new documentation project.
Optimized text (35 words):
This is the official PostgreSQL documentation. It is written by the PostgreSQL community in parallel with the development of the software. We have organized it by the type of user and their stages of experience:
Issues that are resolved with the optimized text:
Succinct text is more likely to be read than skimmed
Removal of extraneous mentions of PostgreSQL
Removal of unneeded justifications
Joining of two paragraphs into one that provides only the needed information to the user
Word count decreased by over 50%. As changes such as these are adopted it would make the documentation more consumable.
That actually exists in our documentation?
I suspect changing it isn't all that worthwhile as the typical user isn't reading the documentation like a book and with the entry point being the table of contents most of that material is simply gleaned from observing the presented structure without words needed to describe it.
While I don't think making readability changes is a bad thing, and maybe my perspective is a bit biased and negative right now, but the attention given to the existing documentation patches in the commitfest isn't that great - so adding another mass of patches fixing up items that haven't provoked complaints seems likely to just make the list longer.
In short, I don't think optimization should be a goal in its own right; but rather changes should mostly be driven by questions asked by our users. I don't think reading random chapters of the documentation to find non-optimal exposition is going to be a good use of time.
> This is the official PostgreSQL documentation. It is written by the
> PostgreSQL community in parallel with the development of the software.
> We have organized it by the type of user and their stages of experience:
Some thoughts on this example:
- Changing "has been" to "is" changes the tone here. "Is" implies that
it is being written continuously, whereas "has been" implies that it's
finished. We do update the docs continuously, but point of the sentence
is that the docs were developed together with the features, so "has
been" seems more accurate.
´- I like "PostgreSQL developers and other volunteers" better than the
"PostgreSQL community". This is the very first introduction to
PostgreSQL, so we can't expect the reader to know what the "PostgreSQL
community" is. I like the "volunteers" word here a lot.
- I think a little bit of ceremony is actually OK in this particular
paragraph, since it's the very first one in the docs.
- I agree with dropping the "to make the large amount of information
manageable".
So I would largely keep this example unchanged, changing it into:
---
This book is the official documentation of PostgreSQL. It has been
written by the PostgreSQL developers and other volunteers in parallel to
the development of the PostgreSQL software. It describes all the
functionality that the current version of PostgreSQL officially supports.
This book has been organized in several parts. Each part is targeted at
a different class of users, or at users in different stages of their
PostgreSQL experience:
---
I agree with these goals in general. I like to refer to
http://www.plainenglish.co.uk/how-to-write-in-plain-english.html when
writing documentation. Or anything else, really.
- Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 14/12/2020 21:50, Joshua Drake wrote: >> Issues that are resolved with the optimized text: >> >> * Succinct text is more likely to be read than skimmed >> >> * Removal of extraneous mentions of PostgreSQL >> >> * Removal of unneeded justifications >> >> * Joining of two paragraphs into one that provides only the needed >> information to the user >> >> * Word count decreased by over 50%. As changes such as these are >> adopted it would make the documentation more consumable. > I agree with these goals in general. I like to refer to > http://www.plainenglish.co.uk/how-to-write-in-plain-english.html when > writing documentation. Or anything else, really. I think this particular chunk of text is an outlier. (Not unreasonably so; as Heikki notes, it's customary for the very beginning of a book to be a bit more formal.) Most of the docs contain pretty dense technical material that's not going to be improved by making it even denser. Also, to the extent that there's duplication, it's often deliberate. For example, if a given bit of info appears in the tutorial and the main docs and the reference pages, that doesn't mean we should rip out two of the three appearances. There certainly are sections that are crying out for reorganization, but that's going to be very topic-specific and not something that just going into it with a copy-editing mindset will help. In short, the devil's in the details. Maybe there are lots of places where this type of approach would help, but I think it's going to be a case-by-case discussion not something where there's a clear win overall. regards, tom lane
For example, what would be exceedly helpful would be a documentation style guide that is canonical and we can review documentation against.
While I don't think making readability changes is a bad thing, and maybe my perspective is a bit biased and negative right now, but the attention given to the existing documentation patches in the commitfest isn't that great - so adding another mass of patches fixing up items that haven't provoked complaints seems likely to just make the list longer.One of the issues is that editing documentation with patches is a pain. It is simpler and a lower barrier of effort to pull up an existing section of Docbook and edit that (just like code) than it is to break out specific text within a patch. Though I would be happy to take a swipe at reviewing a specific documentation patch (as you linked).
In short, I don't think optimization should be a goal in its own right; but rather changes should mostly be driven by questions asked by our users. I don't think reading random chapters of the documentation to find non-optimal exposition is going to be a good use of time.I wasn't planning on reading random chapters. I was planning on walking through the documentation as it is written and hopefully others would join. This is a monumental effort to perform completely. Also consider the overall benefit, not just one specific piece. Would you not consider it a net win if certain questions were being answered in a succinct way as to allow users to use the documentation instead of asking the most novice of questions on various channels?
In short, the devil's in the details. Maybe there are lots of
places where this type of approach would help, but I think it's
going to be a case-by-case discussion not something where there's
a clear win overall.
Thus far, our queries have only accessed one table at a time. Queries can access multiple tables at once, or access the same table in such a way that multiple rows of the table are being processed at the same time. A query that accesses multiple rows of the same or different tables at one time is called a join query. As an example, say you wish to list all the weather records together with the location of the associated city. To do that, we need to compare the city column of each row of the weather table with the name column of all rows in the cities table, and select the pairs of rows where these values match.
On Mon, Dec 14, 2020 at 12:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Most of the docs contain pretty dense technical > material that's not going to be improved by making it even denser. It's always hard to write dense technical prose, for a variety of reasons. I often struggle with framing. For example I seem to write sentences that sound indecisive. But is that necessarily a bad thing? It seems wise to hedge a little bit when talking about (say) some kind of complex system with many moving parts. Ernest Hemingway never had to describe how VACUUM works. I agree with Heikki to some degree; there is value in trying to follow a style guide. But let's not forget about the other problem with the docs, which is that there isn't enough low level technical details of the kind that advanced users value. There is a clear unmet demand for that IME. If we're going to push in the direction of simplification, it should not make this other important task harder. -- Peter Geoghegan
Joshua Drake <jd@commandprompt.com> writes: > Certainly and I didn't want to just start dumping patches. Part of this is > just style, for example: > Thus far, our queries have only accessed one table at a time. Queries can > access multiple tables at once, or access the same table in such a way that > multiple rows of the table are being processed at the same time. A query > that accesses multiple rows of the same or different tables at one time is > called a join query. As an example, say you wish to list all the weather > records together with the location of the associated city. To do that, we > need to compare the city column of each row of the weather table with the > name column of all rows in the cities table, and select the pairs of rows > where these values match. > It isn't "terrible" but can definitely be optimized. In a quick review, I > would put it something like this: > Queries can also access multiple tables at once, or access the same table > in a way that multiple rows are processed. A query that accesses multiple > rows of the same or different tables at one time is a join. For example, if > you wish to list all of the weather records with the location of the > associated city, we would compare the city column of each row of the weather > table with the name column of all rows in the cities table, and select the > rows *WHERE* the values match. TBH, I'm not sure that that is an improvement at all. I'm constantly reminded that for many of our users, English is not their first language. A little bit of redundancy in wording is often helpful for them. The places where I think the docs need help tend to be places where assorted people have added information over time, such that there's not a consistent style throughout a section; or maybe the information could be presented in a better order. We don't need to be taking a hacksaw to text that's perfectly clear as it stands. (If I were thinking of rewriting this text, I'd probably think of removing the references to self-joins and covering that topic in a separate para. But that's because self-joins aren't basic usage, not because I think the text is unreadable.) > The reason I bolded and capitalized WHERE was to provide a visual signal to > the example that is on the page. IMO, typographical tricks are not something to lean on heavily. regards, tom lane
> Queries can also access multiple tables at once, or access the same table
> in a way that multiple rows are processed. A query that accesses multiple
> rows of the same or different tables at one time is a join. For example, if
> you wish to list all of the weather records with the location of the
> associated city, we would compare the city column of each row of the weather
> table with the name column of all rows in the cities table, and select the
> rows *WHERE* the values match.
TBH, I'm not sure that that is an improvement at all. I'm constantly
reminded that for many of our users, English is not their first language.
A little bit of redundancy in wording is often helpful for them.
The places where I think the docs need help tend to be places where
assorted people have added information over time, such that there's
not a consistent style throughout a section; or maybe the information
could be presented in a better order. We don't need to be taking a
hacksaw to text that's perfectly clear as it stands.
(If I were thinking of rewriting this text, I'd probably think of
removing the references to self-joins and covering that topic
in a separate para. But that's because self-joins aren't basic
usage, not because I think the text is unreadable.)
> The reason I bolded and capitalized WHERE was to provide a visual signal to
> the example that is on the page.
IMO, typographical tricks are not something to lean on heavily.
On Mon, Dec 14, 2020 at 01:38:05PM -0800, Peter Geoghegan wrote: > On Mon, Dec 14, 2020 at 12:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Most of the docs contain pretty dense technical > > material that's not going to be improved by making it even denser. > > It's always hard to write dense technical prose, for a variety of > reasons. I often struggle with framing. For example I seem to write > sentences that sound indecisive. But is that necessarily a bad thing? > It seems wise to hedge a little bit when talking about (say) some kind > of complex system with many moving parts. Ernest Hemingway never had > to describe how VACUUM works. > > I agree with Heikki to some degree; there is value in trying to follow > a style guide. But let's not forget about the other problem with the > docs, which is that there isn't enough low level technical details of > the kind that advanced users value. There is a clear unmet demand for > that IME. If we're going to push in the direction of simplification, > it should not make this other important task harder. I agree a holistic review of the docs can yield great benefits. No one usually complains about overly verbose text, but making it clearer is always a win. Anyway, of course, it is going to be very specific for each case. As an extreme example, in 2007 when I did a full review of the docs, I clarified may/can/might in our docs, and it probably helped. Here is one of several commits: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e81c138e18 -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Thu, Dec 17, 2020 at 7:42 AM Bruce Momjian <bruce@momjian.us> wrote: > I agree a holistic review of the docs can yield great benefits. No one > usually complains about overly verbose text, but making it clearer is > always a win. Anyway, of course, it is going to be very specific for > each case. As an extreme example, in 2007 when I did a full review of > the docs, I clarified may/can/might in our docs, and it probably helped. I think that the "may/can/might" rule is a very good one. It standardizes something that would otherwise just be left to chance, and AFAICT has no possible downside. Even still, I think that adding new rules is subject to sharp diminishing returns. There just aren't that many things that work like that. -- Peter Geoghegan