Thread: postgres documentation - proposed improvement/clarification

postgres documentation - proposed improvement/clarification

From
"Graeme B. Bell"
Date:
Hi everyone

The documentation for postgres is generally great, but I noticed a problem =
today while using the doc webpages to reply to a user on the pgsql-performa=
nce mailing list.=20

The problem relates to how default settings are communicated in the documen=
tation. Keep in mind that not all postgresql admins have English as their f=
irst language, so it should not be necessary to guess from the phrasing of =
a paragraph about what is set by default. Also, many people keep around old=
er config files, or don't have a vanilla postgresql.conf file handy to chec=
k for reference (and you never know, someone might have modified your vanil=
la reference .conf...). So the documentation is for many the primary refere=
nce and it should be clear exactly what postgres does in the absence of act=
ively chosen configuration settings. It should also be clear what those set=
tings can be, and how they should be entered.

Take a look at this page, as an example:   http://www.postgresql.org/docs/9=
.4/static/runtime-config-wal.html

Thoughts:

1. Default values are not always specified for each setting, but should be.
Example: documentation for fsync (boolean) doesn't have the default specifi=
ed.


2. Default values are not specified in a consistent place or style in the t=
ext.
Examples: take a look at=20
wal_level (enum)
full_page_writes (boolean)
wal_buffers (integer)


3. Information about default values are sometimes mixed into longer sentenc=
es on another topic. This isn't a big problem but it makes it harder to spo=
t the default value in the paragraph.
Example:
wal_buffers (integer)


4. Default values are sometimes documented in a slightly different style or=
 format to their actual use in the config file. For example, integers like =
5 are given as text 'five'. This isn't a big problem but it makes it harder=
 to find the default value in the paragraph; you're looking for an integer =
in the text, but the number is written as a string. It might be better to b=
reak the writing convention of putting some numbers as text in English. Thi=
s is a document explaining what to type into the config file. Examples or d=
efaults should always be valid cases if  copied directly into the config fi=
le.=20

Example:=20
commit_delay (integer)
"The default commit_delay is zero (no delay)"        (actual commit_delay d=
efault is '0', of course, not 'zero')
vs.
checkpoint_completion_target (floating point)
"The default is 0.5."


5. Where the type is specified as 'boolean', the normal & default values ar=
e not 'true/false' or '1/0', as would be expected for a boolean typed param=
eter. Yes, I know on/off is also boolean, but I bet if you surveyed 100 pro=
grammers and asked them about likely default values for a boolean setting, =
few would say 'on' in reply. It actually makes me wonder if this is better =
described to users as a 2-value enum type.
Example:
full_page_writes (boolean)
"The default is on."


6. The present method of documenting the datatype alongside the name isn't =
actually that helpful for most people reading the documentation. How many r=
eaders are helped by knowing that wal_sync_method is an (enum) as the first=
 thing they read about it?


7. Default units? And should units be included in the setting value?=20

Look at this example. Can anyone tell me, using *only* reference to this pa=
rameter documentation, if the parameter can be set to "8", "8kB", "8KB" or =
"8MB" in the config file?=20
Again, using only this documentation, can you tell for certain that if I ch=
oose '8' it will be bytes , or kb, or a configuration error?

=3D=3D=3D=3D=3D
wal_buffers (integer)
The amount of shared memory used for WAL data that has not yet been written=
 to disk. The default setting of -1 selects a size equal to 1/32nd (about 3=
%) ofshared_buffers, but not less than 64kB nor more than the size of one W=
AL segment, typically 16MB. This value can be set manually if the automatic=
 choice is too large or too small, but any positive value less than 32kB wi=
ll be treated as 32kB. This parameter can only be set at server start.

The contents of the WAL buffers are written out to disk at every transactio=
n commit, so extremely large values are unlikely to provide a significant b=
enefit. However, setting this value to at least a few megabytes can improve=
 write performance on a busy server where many clients are committing at on=
ce. The auto-tuning selected by the default setting of -1 should give reaso=
nable results in most cases.
=3D=3D=3D=3D=3D



Proposed solutions.

Perhaps it might be worth extending or replacing the type information in th=
e header, by including info about the default, possibly replacing the type =
info at that part of the document.=20


e.g. How about this style?

        synchronous_commit (default: on)

        Specifies whether transaction commit will wait for WAL records to be...


or this style?

        synchronous_commit (enum, default: on)

        Specifies whether transaction commit will wait for WAL records to be...


or this?

        synchronous_commit (enum)
        permitted values: on, remote_write, local, off  =20
        default: on
    =09

        wal_buffers (integer)
        permitted values (in kB): -1 (auto-tuning)  and 32-65536.=20
        default: -1




In most cases, this information is there in the paragraph somewhere, but pr=
esenting the config option in this way would make it easier to refer to wit=
hout needing to parse and understand the entire description to understand t=
he default and permitted settings.=20

This would make it easier for people to quickly check how their server is s=
etup if a) the config file is lacking the setting or b) may have been modif=
ied in the past or c) may have been retained from a previous version of pos=
tgres with different defaults.=20

It also means that we don't need e.g. duplicate specification of default va=
lues in the text description - e.g. take a look at wal_buffers (integer), w=
hich specifies it twice.=20

Thoughts?

Graeme Bell

Re: postgres documentation - proposed improvement/clarification

From
Gavin Flower
Date:
On 02/06/15 23:58, Graeme B. Bell wrote:
> Hi everyone
>
> The documentation for postgres is generally great, but I noticed a problem today while using the doc webpages to
replyto a user on the pgsql-performance mailing list. 
>
> The problem relates to how default settings are communicated in the documentation. Keep in mind that not all
postgresqladmins have English as their first language, so it should not be necessary to guess from the phrasing of a
paragraphabout what is set by default. Also, many people keep around older config files, or don't have a vanilla
postgresql.conffile handy to check for reference (and you never know, someone might have modified your vanilla
reference.conf...). So the documentation is for many the primary reference and it should be clear exactly what postgres
doesin the absence of actively chosen configuration settings. It should also be clear what those settings can be, and
howthey should be entered. 
>
> Take a look at this page, as an example:   http://www.postgresql.org/docs/9.4/static/runtime-config-wal.html
>
> Thoughts:
>
> 1. Default values are not always specified for each setting, but should be.
> Example: documentation for fsync (boolean) doesn't have the default specified.
>
>
> 2. Default values are not specified in a consistent place or style in the text.
> Examples: take a look at
> wal_level (enum)
> full_page_writes (boolean)
> wal_buffers (integer)
>
>
> 3. Information about default values are sometimes mixed into longer sentences on another topic. This isn't a big
problembut it makes it harder to spot the default value in the paragraph. 
> Example:
> wal_buffers (integer)
>
>
> 4. Default values are sometimes documented in a slightly different style or format to their actual use in the config
file.For example, integers like 5 are given as text 'five'. This isn't a big problem but it makes it harder to find the
defaultvalue in the paragraph; you're looking for an integer in the text, but the number is written as a string. It
mightbe better to break the writing convention of putting some numbers as text in English. This is a document
explainingwhat to type into the config file. Examples or defaults should always be valid cases if  copied directly into
theconfig file. 
>
> Example:
> commit_delay (integer)
> "The default commit_delay is zero (no delay)"        (actual commit_delay default is '0', of course, not 'zero')
> vs.
> checkpoint_completion_target (floating point)
> "The default is 0.5."
>
>
> 5. Where the type is specified as 'boolean', the normal & default values are not 'true/false' or '1/0', as would be
expectedfor a boolean typed parameter. Yes, I know on/off is also boolean, but I bet if you surveyed 100 programmers
andasked them about likely default values for a boolean setting, few would say 'on' in reply. It actually makes me
wonderif this is better described to users as a 2-value enum type. 
> Example:
> full_page_writes (boolean)
> "The default is on."
>
>
> 6. The present method of documenting the datatype alongside the name isn't actually that helpful for most people
readingthe documentation. How many readers are helped by knowing that wal_sync_method is an (enum) as the first thing
theyread about it? 
>
>
> 7. Default units? And should units be included in the setting value?
>
> Look at this example. Can anyone tell me, using *only* reference to this parameter documentation, if the parameter
canbe set to "8", "8kB", "8KB" or "8MB" in the config file? 
> Again, using only this documentation, can you tell for certain that if I choose '8' it will be bytes , or kb, or a
configurationerror? 
>
> =====
> wal_buffers (integer)
> The amount of shared memory used for WAL data that has not yet been written to disk. The default setting of -1
selectsa size equal to 1/32nd (about 3%) ofshared_buffers, but not less than 64kB nor more than the size of one WAL
segment,typically 16MB. This value can be set manually if the automatic choice is too large or too small, but any
positivevalue less than 32kB will be treated as 32kB. This parameter can only be set at server start. 
>
> The contents of the WAL buffers are written out to disk at every transaction commit, so extremely large values are
unlikelyto provide a significant benefit. However, setting this value to at least a few megabytes can improve write
performanceon a busy server where many clients are committing at once. The auto-tuning selected by the default setting
of-1 should give reasonable results in most cases. 
> =====
>
>
>
> Proposed solutions.
>
> Perhaps it might be worth extending or replacing the type information in the header, by including info about the
default,possibly replacing the type info at that part of the document. 
>
>
> e.g. How about this style?
>
>         synchronous_commit (default: on)
>
>         Specifies whether transaction commit will wait for WAL records to be...
>
>
> or this style?
>
>         synchronous_commit (enum, default: on)
>
>         Specifies whether transaction commit will wait for WAL records to be...
>
>
> or this?
>
>         synchronous_commit (enum)
>         permitted values: on, remote_write, local, off
>         default: on
>
>
>         wal_buffers (integer)
>         permitted values (in kB): -1 (auto-tuning)  and 32-65536.
===> what does '32-65536' mean?  I know what it means, but if someone is
very stressed and looking at it for the first time, it looks like nonsense!
>         default: -1
>
>
>
>
> In most cases, this information is there in the paragraph somewhere, but presenting the config option in this way
wouldmake it easier to refer to without needing to parse and understand the entire description to understand the
defaultand permitted settings. 
>
> This would make it easier for people to quickly check how their server is setup if a) the config file is lacking the
settingor b) may have been modified in the past or c) may have been retained from a previous version of postgres with
differentdefaults. 
>
> It also means that we don't need e.g. duplicate specification of default values in the text description - e.g. take a
lookat wal_buffers (integer), which specifies it twice. 
>
> Thoughts?
>
> Graeme Bell
>
>
>
>
>
>
I suggest that boolean values should use either true or false,
consistently..


Cheers,
Gavin

Re: postgres documentation - proposed improvement/clarification

From
"Graeme B. Bell"
Date:
>>=20
>>         permitted values (in kB): -1 (auto-tuning)  and 32-65536.
> =3D=3D=3D> what does '32-65536' mean?  I know what it means, but if someo=
ne is very stressed and looking at it for the first time, it looks like non=
sense!

Indeed, this is an interesting special case that I used to draw out this pr=
oblem.
The documentation mentions that values below 32 are treated as 32.=20
But I don't think we should try to squeeze the entire documentation into th=
at line.=20

In this particular example though, saying 0-65536 might be considered misle=
ading since some of those values change into 32.
32-65536 is confusing in a slightly different sense, because 0-31 are actua=
lly valid possibilities, but they definitely don't do what you'd expect.

The underlying issue is:

a) should you list the complete range of inputs that postgresql will accept=
 as a permitted value?
b) or should you list the complete range of 'sensible' inputs that postgres=
ql will accept as a permitted value?=20

Perhaps you are right and (a) is an easier choice to maintain. =20

Graeme.




On 02 Jun 2015, at 21:19, Gavin Flower <GavinFlower@archidevsys.co.nz> wrot=
e:

> On 02/06/15 23:58, Graeme B. Bell wrote:
>> Hi everyone
>>=20
>> The documentation for postgres is generally great, but I noticed a probl=
em today while using the doc webpages to reply to a user on the pgsql-perfo=
rmance mailing list.
>>=20
>> The problem relates to how default settings are communicated in the docu=
mentation. Keep in mind that not all postgresql admins have English as thei=
r first language, so it should not be necessary to guess from the phrasing =
of a paragraph about what is set by default. Also, many people keep around =
older config files, or don't have a vanilla postgresql.conf file handy to c=
heck for reference (and you never know, someone might have modified your va=
nilla reference .conf...). So the documentation is for many the primary ref=
erence and it should be clear exactly what postgres does in the absence of =
actively chosen configuration settings. It should also be clear what those =
settings can be, and how they should be entered.
>>=20
>> Take a look at this page, as an example:   http://www.postgresql.org/doc=
s/9.4/static/runtime-config-wal.html
>>=20
>> Thoughts:
>>=20
>> 1. Default values are not always specified for each setting, but should =
be.
>> Example: documentation for fsync (boolean) doesn't have the default spec=
ified.
>>=20
>>=20
>> 2. Default values are not specified in a consistent place or style in th=
e text.
>> Examples: take a look at
>> wal_level (enum)
>> full_page_writes (boolean)
>> wal_buffers (integer)
>>=20
>>=20
>> 3. Information about default values are sometimes mixed into longer sent=
ences on another topic. This isn't a big problem but it makes it harder to =
spot the default value in the paragraph.
>> Example:
>> wal_buffers (integer)
>>=20
>>=20
>> 4. Default values are sometimes documented in a slightly different style=
 or format to their actual use in the config file. For example, integers li=
ke 5 are given as text 'five'. This isn't a big problem but it makes it har=
der to find the default value in the paragraph; you're looking for an integ=
er in the text, but the number is written as a string. It might be better t=
o break the writing convention of putting some numbers as text in English. =
This is a document explaining what to type into the config file. Examples o=
r defaults should always be valid cases if  copied directly into the config=
 file.
>>=20
>> Example:
>> commit_delay (integer)
>> "The default commit_delay is zero (no delay)"        (actual commit_dela=
y default is '0', of course, not 'zero')
>> vs.
>> checkpoint_completion_target (floating point)
>> "The default is 0.5."
>>=20
>>=20
>> 5. Where the type is specified as 'boolean', the normal & default values=
 are not 'true/false' or '1/0', as would be expected for a boolean typed pa=
rameter. Yes, I know on/off is also boolean, but I bet if you surveyed 100 =
programmers and asked them about likely default values for a boolean settin=
g, few would say 'on' in reply. It actually makes me wonder if this is bett=
er described to users as a 2-value enum type.
>> Example:
>> full_page_writes (boolean)
>> "The default is on."
>>=20
>>=20
>> 6. The present method of documenting the datatype alongside the name isn=
't actually that helpful for most people reading the documentation. How man=
y readers are helped by knowing that wal_sync_method is an (enum) as the fi=
rst thing they read about it?
>>=20
>>=20
>> 7. Default units? And should units be included in the setting value?
>>=20
>> Look at this example. Can anyone tell me, using *only* reference to this=
 parameter documentation, if the parameter can be set to "8", "8kB", "8KB" =
or "8MB" in the config file?
>> Again, using only this documentation, can you tell for certain that if I=
 choose '8' it will be bytes , or kb, or a configuration error?
>>=20
>> =3D=3D=3D=3D=3D
>> wal_buffers (integer)
>> The amount of shared memory used for WAL data that has not yet been writ=
ten to disk. The default setting of -1 selects a size equal to 1/32nd (abou=
t 3%) ofshared_buffers, but not less than 64kB nor more than the size of on=
e WAL segment, typically 16MB. This value can be set manually if the automa=
tic choice is too large or too small, but any positive value less than 32kB=
 will be treated as 32kB. This parameter can only be set at server start.
>>=20
>> The contents of the WAL buffers are written out to disk at every transac=
tion commit, so extremely large values are unlikely to provide a significan=
t benefit. However, setting this value to at least a few megabytes can impr=
ove write performance on a busy server where many clients are committing at=
 once. The auto-tuning selected by the default setting of -1 should give re=
asonable results in most cases.
>> =3D=3D=3D=3D=3D
>>=20
>>=20
>>=20
>> Proposed solutions.
>>=20
>> Perhaps it might be worth extending or replacing the type information in=
 the header, by including info about the default, possibly replacing the ty=
pe info at that part of the document.
>>=20
>>=20
>> e.g. How about this style?
>>=20
>>         synchronous_commit (default: on)
>>=20
>>         Specifies whether transaction commit will wait for WAL records to be..=
.
>>=20
>>=20
>> or this style?
>>=20
>>         synchronous_commit (enum, default: on)
>>=20
>>         Specifies whether transaction commit will wait for WAL records to be..=
.
>>=20
>>=20
>> or this?
>>=20
>>         synchronous_commit (enum)
>>         permitted values: on, remote_write, local, off
>>         default: on
>>     =09
>>=20
>>         wal_buffers (integer)
>>         permitted values (in kB): -1 (auto-tuning)  and 32-65536.
> =3D=3D=3D> what does '32-65536' mean?  I know what it means, but if someo=
ne is very stressed and looking at it for the first time, it looks like non=
sense!
>>         default: -1
>>=20
>>=20
>>=20
>>=20
>> In most cases, this information is there in the paragraph somewhere, but=
 presenting the config option in this way would make it easier to refer to =
without needing to parse and understand the entire description to understan=
d the default and permitted settings.
>>=20
>> This would make it easier for people to quickly check how their server i=
s setup if a) the config file is lacking the setting or b) may have been mo=
dified in the past or c) may have been retained from a previous version of =
postgres with different defaults.
>>=20
>> It also means that we don't need e.g. duplicate specification of default=
 values in the text description - e.g. take a look at wal_buffers (integer)=
, which specifies it twice.
>>=20
>> Thoughts?
>>=20
>> Graeme Bell
>>=20
>>=20
>>=20
>>=20
>>=20
>>=20
> I suggest that boolean values should use either true or false, consistent=
ly..
>=20
>=20
> Cheers,
> Gavin
>=20
>=20
>=20
> --=20
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs