Thread: Optimizing the documentation

Optimizing the documentation

From
Joshua Drake
Date:

-hackers,

The community has spent a lot of time optimizing features over the years. Excellent examples include parallel query and partitioning which have been multi-year efforts to increase the quality,  performance, and extend features of the original commit. We should consider the documentation in a similar manner. Just like code, documentation can sometimes use a bug fix, optimization, and/or new features added to the original implementation.

Technical documentation should only be as verbose as needed to illustrate the concept or task that we are explaining. It should not be redundant, nor should it use .50 cent words when a .10 cent word would suffice. I would like to put effort into optimizing the documentation and am requesting general consensus that this would be a worthwhile effort before I begin to dust off my Docbook skills. 

I have provided an example below:

Original text (79 words):

This book is the official documentation of PostgreSQL. It has been written by the PostgreSQL developers and other volunteers in parallel to the development of the PostgreSQL software. It describes all the functionality that the current version of PostgreSQL officially supports.

To make the large amount of information about PostgreSQL manageable, this book has been organized in several parts. Each part is targeted at a different class of users, or at users in different stages of their PostgreSQL experience:

Optimized text (35 words):

This is the official PostgreSQL documentation. It is written by the PostgreSQL community in parallel with the development of the software. We have organized it by the type of user and their stages of experience:

Issues that are resolved with the optimized text:

  • Succinct text is more likely to be read than skimmed

  • Removal of extraneous mentions of PostgreSQL

  • Removal of unneeded justifications

  • Joining of two paragraphs into one that provides only the needed information to the user

  • Word count decreased by over 50%. As changes such as these are adopted it would make the documentation more consumable.

Thanks,
JD

--
Founder - https://commandprompt.com/ - 24x7x365 Postgres since 1997
Co-Chair - https://postgresconf.org/ - Postgres Education at its finest
People, Postgres, Data

Re: Optimizing the documentation

From
"David G. Johnston"
Date:
On Mon, Dec 14, 2020 at 12:50 PM Joshua Drake <jd@commandprompt.com> wrote:

-hackers,

The community has spent a lot of time optimizing features over the years. Excellent examples include parallel query and partitioning which have been multi-year efforts to increase the quality,  performance, and extend features of the original commit. We should consider the documentation in a similar manner. Just like code, documentation can sometimes use a bug fix, optimization, and/or new features added to the original implementation.

Technical documentation should only be as verbose as needed to illustrate the concept or task that we are explaining. It should not be redundant, nor should it use .50 cent words when a .10 cent word would suffice. I would like to put effort into optimizing the documentation and am requesting general consensus that this would be a worthwhile effort before I begin to dust off my Docbook skills. 


As a quick observation, it would be more immediately helpful to add to the existing proposal to add more details about architecture and get that committed before embarking on a new documentation project.


 

I have provided an example below:

Original text (79 words):

This book is the official documentation of PostgreSQL. It has been written by the PostgreSQL developers and other volunteers in parallel to the development of the PostgreSQL software. It describes all the functionality that the current version of PostgreSQL officially supports.

To make the large amount of information about PostgreSQL manageable, this book has been organized in several parts. Each part is targeted at a different class of users, or at users in different stages of their PostgreSQL experience:

Optimized text (35 words):

This is the official PostgreSQL documentation. It is written by the PostgreSQL community in parallel with the development of the software. We have organized it by the type of user and their stages of experience:

Issues that are resolved with the optimized text:

  • Succinct text is more likely to be read than skimmed

  • Removal of extraneous mentions of PostgreSQL

  • Removal of unneeded justifications

  • Joining of two paragraphs into one that provides only the needed information to the user

  • Word count decreased by over 50%. As changes such as these are adopted it would make the documentation more consumable.

That actually exists in our documentation?  I suspect changing it isn't all that worthwhile as the typical user isn't reading the documentation like a book and with the entry point being the table of contents most of that material is simply gleaned from observing the presented structure without words needed to describe it.

While I don't think making readability changes is a bad thing, and maybe my perspective is a bit biased and negative right now, but the attention given to the existing documentation patches in the commitfest isn't that great - so adding another mass of patches fixing up items that haven't provoked complaints seems likely to just make the list longer.

In short, I don't think optimization should be a goal in its own right; but rather changes should mostly be driven by questions asked by our users.  I don't think reading random chapters of the documentation to find non-optimal exposition is going to be a good use of time.

David J.

Re: Optimizing the documentation

From
Heikki Linnakangas
Date:
On 14/12/2020 21:50, Joshua Drake wrote:
> The community has spent a lot of time optimizing features over the 
> years. Excellent examples include parallel query and partitioning which 
> have been multi-year efforts to increase the quality,  performance, and 
> extend features of the original commit. We should consider the 
> documentation in a similar manner. Just like code, documentation can 
> sometimes use a bug fix, optimization, and/or new features added to the 
> original implementation.
> 
> Technical documentation should only be as verbose as needed to 
> illustrate the concept or task that we are explaining. It should not be 
> redundant, nor should it use .50 cent words when a .10 cent word would 
> suffice. I would like to put effort into optimizing the documentation 
> and am requesting general consensus that this would be a worthwhile 
> effort before I begin to dust off my Docbook skills.

Hard to argue with "let's make the doc better" :-).

I expect that there will be a lot of bikeshedding over the exact 
phrases. That's OK. Every improvement that actually gets committed 
helps, even if we don't make progress on other parts.

> I have provided an example below:
> 
> 
> Original text (79 words):
> 
> 
> This book is the official documentation of PostgreSQL. It has been 
> written by the PostgreSQL developers and other volunteers in parallel to 
> the development of the PostgreSQL software. It describes all the 
> functionality that the current version of PostgreSQL officially supports.
> 
> To make the large amount of information about PostgreSQL manageable, 
> this book has been organized in several parts. Each part is targeted at 
> a different class of users, or at users in different stages of their 
> PostgreSQL experience:
> 
> Optimized text (35 words):
> 
> 
> This is the official PostgreSQL documentation. It is written by the 
> PostgreSQL community in parallel with the development of the software. 
> We have organized it by the type of user and their stages of experience:

Some thoughts on this example:

- Changing "has been" to "is" changes the tone here. "Is" implies that 
it is being written continuously, whereas "has been" implies that it's 
finished. We do update the docs continuously, but point of the sentence 
is that the docs were developed together with the features, so "has 
been" seems more accurate.

´- I like "PostgreSQL developers and other volunteers" better than the 
"PostgreSQL community". This is the very first introduction to 
PostgreSQL, so we can't expect the reader to know what the "PostgreSQL 
community" is. I like the "volunteers" word here a lot.

- I think a little bit of ceremony is actually OK in this particular 
paragraph, since it's the very first one in the docs.

- I agree with dropping the "to make the large amount of information 
manageable".

So I would largely keep this example unchanged, changing it into:

---
This book is the official documentation of PostgreSQL. It has been 
written by the PostgreSQL developers and other volunteers in parallel to 
the development of the PostgreSQL software. It describes all the 
functionality that the current version of PostgreSQL officially supports.

This book has been organized in several parts. Each part is targeted at 
a different class of users, or at users in different stages of their 
PostgreSQL experience:
---

> Issues that are resolved with the optimized text:
> 
>   * Succinct text is more likely to be read than skimmed
> 
>   * Removal of extraneous mentions of PostgreSQL
> 
>   * Removal of unneeded justifications
> 
>   * Joining of two paragraphs into one that provides only the needed
>     information to the user
> 
>   * Word count decreased by over 50%. As changes such as these are
>     adopted it would make the documentation more consumable.
I agree with these goals in general. I like to refer to 
http://www.plainenglish.co.uk/how-to-write-in-plain-english.html when 
writing documentation. Or anything else, really.

- Heikki



Re: Optimizing the documentation

From
Joshua Drake
Date:


Technical documentation should only be as verbose as needed to illustrate the concept or task that we are explaining. It should not be redundant, nor should it use .50 cent words when a .10 cent word would suffice. I would like to put effort into optimizing the documentation and am requesting general consensus that this would be a worthwhile effort before I begin to dust off my Docbook skills. 


As a quick observation, it would be more immediately helpful to add to the existing proposal to add more details about architecture and get that committed before embarking on a new documentation project.


I considered just starting to review patches as such but even with that, doesn't it make sense that if I am going to be putting a particular thought process into my efforts that there is a general consensus? For example, what would be exceedly helpful would be a documentation style guide that is canonical and we can review documentation against. Currently our documentation is all over the place. It isn't that it is not technically accurate or comprehensive
 

Optimized text (35 words):

This is the official PostgreSQL documentation. It is written by the PostgreSQL community in parallel with the development of the software. We have organized it by the type of user and their stages of experience:

Issues that are resolved with the optimized text:

  • Succinct text is more likely to be read than skimmed

  • Removal of extraneous mentions of PostgreSQL

  • Removal of unneeded justifications

  • Joining of two paragraphs into one that provides only the needed information to the user

  • Word count decreased by over 50%. As changes such as these are adopted it would make the documentation more consumable.

That actually exists in our documentation? 

 
I suspect changing it isn't all that worthwhile as the typical user isn't reading the documentation like a book and with the entry point being the table of contents most of that material is simply gleaned from observing the presented structure without words needed to describe it.

It is a matter of consistency. 
 

While I don't think making readability changes is a bad thing, and maybe my perspective is a bit biased and negative right now, but the attention given to the existing documentation patches in the commitfest isn't that great - so adding another mass of patches fixing up items that haven't provoked complaints seems likely to just make the list longer.

One of the issues is that editing documentation with patches is a pain. It is simpler and a lower barrier of effort to pull up an existing section of Docbook and edit that (just like code) than it is to break out specific text within a patch. Though I would be happy to take a swipe at reviewing a specific documentation patch (as you linked).
 

In short, I don't think optimization should be a goal in its own right; but rather changes should mostly be driven by questions asked by our users.  I don't think reading random chapters of the documentation to find non-optimal exposition is going to be a good use of time.

I wasn't planning on reading random chapters. I was planning on walking through the documentation as it is written and hopefully others would join. This is a monumental effort to perform completely. Also consider the overall benefit, not just one specific piece. Would you not consider it a net win if certain questions were being answered in a succinct way as to allow users to use the documentation instead of asking the most novice of questions on various channels?

JD

Re: Optimizing the documentation

From
Joshua Drake
Date:


> This is the official PostgreSQL documentation. It is written by the
> PostgreSQL community in parallel with the development of the software.
> We have organized it by the type of user and their stages of experience:

Some thoughts on this example:

- Changing "has been" to "is" changes the tone here. "Is" implies that
it is being written continuously, whereas "has been" implies that it's
finished. We do update the docs continuously, but point of the sentence
is that the docs were developed together with the features, so "has
been" seems more accurate.

No argument.
 

´- I like "PostgreSQL developers and other volunteers" better than the
"PostgreSQL community". This is the very first introduction to
PostgreSQL, so we can't expect the reader to know what the "PostgreSQL
community" is. I like the "volunteers" word here a lot.


There is a huge community for PostgreSQL, the developers are only a small (albeit critical) part of it. By using the term "PostgreSQL community" we are providing equity to all those who participate in the success of the project. I could definitely see saying "PostgreSQL volunteers".

 
- I think a little bit of ceremony is actually OK in this particular
paragraph, since it's the very first one in the docs.

- I agree with dropping the "to make the large amount of information
manageable".

So I would largely keep this example unchanged, changing it into:

---
This book is the official documentation of PostgreSQL. It has been
written by the PostgreSQL developers and other volunteers in parallel to
the development of the PostgreSQL software. It describes all the
functionality that the current version of PostgreSQL officially supports.

This book has been organized in several parts. Each part is targeted at
a different class of users, or at users in different stages of their
PostgreSQL experience:
---

 
I appreciate the feedback and before we get too far down the rabbit hole, I would like to note that I am not tied to an exact wording as my post was more about the general goal and results based on that goal. 
 
I agree with these goals in general. I like to refer to
http://www.plainenglish.co.uk/how-to-write-in-plain-english.html when
writing documentation. Or anything else, really.

Great resource!

JD
 

- Heikki

Re: Optimizing the documentation

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 14/12/2020 21:50, Joshua Drake wrote:
>> Issues that are resolved with the optimized text:
>> 
>> * Succinct text is more likely to be read than skimmed
>> 
>> * Removal of extraneous mentions of PostgreSQL
>> 
>> * Removal of unneeded justifications
>> 
>> * Joining of two paragraphs into one that provides only the needed
>> information to the user
>> 
>> * Word count decreased by over 50%. As changes such as these are
>> adopted it would make the documentation more consumable.

> I agree with these goals in general. I like to refer to 
> http://www.plainenglish.co.uk/how-to-write-in-plain-english.html when 
> writing documentation. Or anything else, really.

I think this particular chunk of text is an outlier.  (Not unreasonably
so; as Heikki notes, it's customary for the very beginning of a book to
be a bit more formal.)  Most of the docs contain pretty dense technical
material that's not going to be improved by making it even denser.
Also, to the extent that there's duplication, it's often deliberate.
For example, if a given bit of info appears in the tutorial and the
main docs and the reference pages, that doesn't mean we should rip
out two of the three appearances.

There certainly are sections that are crying out for reorganization,
but that's going to be very topic-specific and not something that
just going into it with a copy-editing mindset will help.

In short, the devil's in the details.  Maybe there are lots of
places where this type of approach would help, but I think it's
going to be a case-by-case discussion not something where there's
a clear win overall.

            regards, tom lane



Re: Optimizing the documentation

From
"David G. Johnston"
Date:
On Mon, Dec 14, 2020 at 1:40 PM Joshua Drake <jd@commandprompt.com> wrote:
For example, what would be exceedly helpful would be a documentation style guide that is canonical and we can review documentation against.

I do agree with that premise, with the goal of getting more people to contribute to writing and reviewing documentation and having more than vague ideas about what is or isn't considered minor items to just leave alone or points of interest to debate.  But as much as I would love perfectly written English documentation I try to consciously make an effort to accept things that maybe aren't perfect but are good enough in the interest of having a larger set of contributors with more varied abilities in this area.  "It is clear enough" is a valid trade-off to take.


Thanks, though it was meant to be a bit rhetorical.
 
 

While I don't think making readability changes is a bad thing, and maybe my perspective is a bit biased and negative right now, but the attention given to the existing documentation patches in the commitfest isn't that great - so adding another mass of patches fixing up items that haven't provoked complaints seems likely to just make the list longer.

One of the issues is that editing documentation with patches is a pain. It is simpler and a lower barrier of effort to pull up an existing section of Docbook and edit that (just like code) than it is to break out specific text within a patch. Though I would be happy to take a swipe at reviewing a specific documentation patch (as you linked).

I'm not following this line of reasoning.

 

In short, I don't think optimization should be a goal in its own right; but rather changes should mostly be driven by questions asked by our users.  I don't think reading random chapters of the documentation to find non-optimal exposition is going to be a good use of time.

I wasn't planning on reading random chapters. I was planning on walking through the documentation as it is written and hopefully others would join. This is a monumental effort to perform completely. Also consider the overall benefit, not just one specific piece. Would you not consider it a net win if certain questions were being answered in a succinct way as to allow users to use the documentation instead of asking the most novice of questions on various channels?

I suspect over half of the questions asked are due to not reading the documentation at all - I tend to get good results when I point someone to the correct terminology and section, and if there are follow-up questions then I know where to look for improvements and have a concrete question or two in hand to ensure that the revised documentation answers.

I'm fairly well plugged into user questions and have recently made an attempt to respond to those with specific patches to improve the documentation involved in those questions.  And also have been working to help other documentation patches get pushed through.  Based upon those experiences I think this monumental community effort is going to stall out pretty quickly - regardless of its merits - though if the effort results in a new guidelines document then I would say it was worth the effort regardless of how many paragraphs are optimized away.

My $0.02

David J.

Re: Optimizing the documentation

From
Joshua Drake
Date:


In short, the devil's in the details.  Maybe there are lots of
places where this type of approach would help, but I think it's
going to be a case-by-case discussion not something where there's
a clear win overall.

Certainly and I didn't want to just start dumping patches. Part of this is just style, for example:

Thus far, our queries have only accessed one table at a time. Queries can access multiple tables at once, or access the same table in such a way that multiple rows of the table are being processed at the same time. A query that accesses multiple rows of the same or different tables at one time is called a join query. As an example, say you wish to list all the weather records together with the location of the associated city. To do that, we need to compare the city column of each row of the weather table with the name column of all rows in the cities table, and select the pairs of rows where these values match.

It isn't "terrible" but can definitely be optimized. In a quick review, I would put it something like this:

Queries can also access multiple tables at once, or access the same table in a way that multiple rows are processed. A query that accesses multiple rows of the same or different tables at one time is a join. For example, if you wish to list all of the weather records with the location of the associated city, we would compare the city column of each row of the weather table with the name column of all rows in the cities table, and select the rows WHERE the values match.

The reason I bolded and capitalized WHERE was to provide a visual signal to the example that is on the page. I could also argue that we could remove "For example," though I understand its purpose here.

Again, this was just a quick review.

JD
 

Re: Optimizing the documentation

From
Peter Geoghegan
Date:
On Mon, Dec 14, 2020 at 12:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  Most of the docs contain pretty dense technical
> material that's not going to be improved by making it even denser.

It's always hard to write dense technical prose, for a variety of
reasons. I often struggle with framing. For example I seem to write
sentences that sound indecisive. But is that necessarily a bad thing?
It seems wise to hedge a little bit when talking about (say) some kind
of complex system with many moving parts. Ernest Hemingway never had
to describe how VACUUM works.

I agree with Heikki to some degree; there is value in trying to follow
a style guide. But let's not forget about the other problem with the
docs, which is that there isn't enough low level technical details of
the kind that advanced users value. There is a clear unmet demand for
that IME. If we're going to push in the direction of simplification,
it should not make this other important task harder.

-- 
Peter Geoghegan



Re: Optimizing the documentation

From
Tom Lane
Date:
Joshua Drake <jd@commandprompt.com> writes:
> Certainly and I didn't want to just start dumping patches. Part of this is
> just style, for example:

> Thus far, our queries have only accessed one table at a time. Queries can
> access multiple tables at once, or access the same table in such a way that
> multiple rows of the table are being processed at the same time. A query
> that accesses multiple rows of the same or different tables at one time is
> called a join query. As an example, say you wish to list all the weather
> records together with the location of the associated city. To do that, we
> need to compare the city column of each row of the weather table with the
> name column of all rows in the cities table, and select the pairs of rows
> where these values match.

> It isn't "terrible" but can definitely be optimized. In a quick review, I
> would put it something like this:

> Queries can also access multiple tables at once, or access the same table
> in a way that multiple rows are processed. A query that accesses multiple
> rows of the same or different tables at one time is a join. For example, if
> you wish to list all of the weather records with the location of the
> associated city, we would compare the city column of each row of the weather
> table with the name column of all rows in the cities table, and select the
> rows *WHERE* the values match.

TBH, I'm not sure that that is an improvement at all.  I'm constantly
reminded that for many of our users, English is not their first language.
A little bit of redundancy in wording is often helpful for them.

The places where I think the docs need help tend to be places where
assorted people have added information over time, such that there's
not a consistent style throughout a section; or maybe the information
could be presented in a better order.  We don't need to be taking a
hacksaw to text that's perfectly clear as it stands.

(If I were thinking of rewriting this text, I'd probably think of
removing the references to self-joins and covering that topic
in a separate para.  But that's because self-joins aren't basic
usage, not because I think the text is unreadable.)

> The reason I bolded and capitalized WHERE was to provide a visual signal to
> the example that is on the page.

IMO, typographical tricks are not something to lean on heavily.

            regards, tom lane



Re: Optimizing the documentation

From
Joshua Drake
Date:


> Queries can also access multiple tables at once, or access the same table
> in a way that multiple rows are processed. A query that accesses multiple
> rows of the same or different tables at one time is a join. For example, if
> you wish to list all of the weather records with the location of the
> associated city, we would compare the city column of each row of the weather
> table with the name column of all rows in the cities table, and select the
> rows *WHERE* the values match.

TBH, I'm not sure that that is an improvement at all.  I'm constantly
reminded that for many of our users, English is not their first language.
A little bit of redundancy in wording is often helpful for them.

Interesting point, it is certainly true that many of our users are ESL folks. I would expect a succinct version to be easier to understand but I have no idea.
 

The places where I think the docs need help tend to be places where
assorted people have added information over time, such that there's
not a consistent style throughout a section; or maybe the information
could be presented in a better order.  We don't need to be taking a
hacksaw to text that's perfectly clear as it stands.

The term perfectly clear is part of the problem I am trying to address. I can pick and pull at the documentation all day long and show things that are not perfectly clear. They are clear to you, myself and I imagine most of the readers on this list. Generally speaking we are not the target of the documentation and we may easily get pulled into the "good enough" when in reality it could be so much better. I have gotten so used to our documentation that I literally skip over unneeded words to get to the answer I am looking for. I don't think that is the target we want to hit.

Wouldn't we want the least amount of mental energy to understand the concept as possible for the reader? Every extra word that isn't needed, every extra adjective, repeated term or "very unique" that exists is extra energy spent to understand what the writer is trying to say. That mental energy can be exhausted quickly, especially when considering dense technical topics.

 
(If I were thinking of rewriting this text, I'd probably think of
removing the references to self-joins and covering that topic
in a separate para.  But that's because self-joins aren't basic
usage, not because I think the text is unreadable.)

That makes sense. I was just taking the direct approach of making existing content better as an example. I would agree with your assessment if it were to be submitted as a patch.
 
> The reason I bolded and capitalized WHERE was to provide a visual signal to
> the example that is on the page.

IMO, typographical tricks are not something to lean on heavily.

Fair enough.

JD
 

Re: Optimizing the documentation

From
Bruce Momjian
Date:
On Mon, Dec 14, 2020 at 01:38:05PM -0800, Peter Geoghegan wrote:
> On Mon, Dec 14, 2020 at 12:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >  Most of the docs contain pretty dense technical
> > material that's not going to be improved by making it even denser.
> 
> It's always hard to write dense technical prose, for a variety of
> reasons. I often struggle with framing. For example I seem to write
> sentences that sound indecisive. But is that necessarily a bad thing?
> It seems wise to hedge a little bit when talking about (say) some kind
> of complex system with many moving parts. Ernest Hemingway never had
> to describe how VACUUM works.
> 
> I agree with Heikki to some degree; there is value in trying to follow
> a style guide. But let's not forget about the other problem with the
> docs, which is that there isn't enough low level technical details of
> the kind that advanced users value. There is a clear unmet demand for
> that IME. If we're going to push in the direction of simplification,
> it should not make this other important task harder.

I agree a holistic review of the docs can yield great benefits.  No one
usually complains about overly verbose text, but making it clearer is
always a win.  Anyway, of course, it is going to be very specific for
each case.  As an extreme example, in 2007 when I did a full review of
the docs, I clarified may/can/might in our docs, and it probably helped.
Here is one of several commits:

    https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e81c138e18

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




Re: Optimizing the documentation

From
Peter Geoghegan
Date:
On Thu, Dec 17, 2020 at 7:42 AM Bruce Momjian <bruce@momjian.us> wrote:
> I agree a holistic review of the docs can yield great benefits.  No one
> usually complains about overly verbose text, but making it clearer is
> always a win.  Anyway, of course, it is going to be very specific for
> each case.  As an extreme example, in 2007 when I did a full review of
> the docs, I clarified may/can/might in our docs, and it probably helped.

I think that the "may/can/might" rule is a very good one. It
standardizes something that would otherwise just be left to chance,
and AFAICT has no possible downside. Even still, I think that adding
new rules is subject to sharp diminishing returns. There just aren't
that many things that work like that.

-- 
Peter Geoghegan