Error message style guide - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Error message style guide
Date
Msg-id Pine.LNX.4.44.0303150139530.2382-100000@peter.localdomain
Whole thread Raw
Responses Re: Error message style guide  (Steve Crawford <scrawford@pinpointresearch.com>)
Re: Error message style guide  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Some people were mentioning an error message style guide.  Here's a start
of one that I put together a while ago.  Feel free to consider it.


Size of message
---------------

The main part of a message should be at most 72 characters long.  For
embedded format specifiers (%s, %d, etc.), a reasonable estimate of
the expected string should be taken into account.  The rest should be
distributed to the detail and the hint parts.

RATIONALE: 72 characters is typically considered an appropriate line
length on terminal-type displays. Consequently, this length is fair to
psql users and readers of the server log.  Also, longer messages will
tend to get chatty.


Newlines, tabs
--------------

A message may not contain a newline or a tab.

RATIONALE: Messages are not necessarily displayed on terminal-type
displays.  In GUI displays or browsers these formatting intructions
are at best ignored.

QUESTION: I think formatting characters should be avoided in detail
and hint messages as well, for the same reasons.


Quotation marks
---------------

English text should use double quotes when quoting is appropriate.
Text in other languages should consistently use one kind of quotes
that is consistent with publishing customs and computer output of
other programs.

RATIONALE: The choice of double quotes over single quotes is somewhat
arbitrary, but tends to be the preferred use.  Do not distinguish the
kind of quotes depending on the type of object in SQL terms (i.e.,
strings single quoted, identifiers double quoted).  This is a
language-internal technical issue that many users aren't even familiar
with, it won't scale to all quoted terms, it doesn't translate to
other languages, and it's pretty pointless, too.


Use of quotes
-------------

Use quotes always to denote files, database objects, and other
variables of a character-string nature.  Do not use them to mark up
nonvariable items.

RATIONALE: Objects can have names that create ambiguity when embedded
in a message.  Be consistent about denoting where a plugged-in name
starts and ends.

NOTE: This format encourages embedding data items into the message in
grammatical positions instead of the old style 'invalid value: bar'.


Punctuation
-----------

Do not end the message with a period.  Do not even think about ending
a message with an exclamation point.

RATIONALE: Avoiding punctuation makes it easier for client
applications to embed the message into a variety of grammatical
contexts.  Often, messages are not grammatically complete sentences
anyway.  (And if they're long enough to be more than one sentence,
split them up.)


Upper case vs. lower case
-------------------------

Use lower case for message wording, including the first letter of the
message.  Use upper case for SQL commands and key words if the message
refers to the command string.

RATIONALE: It's easier to make everything look more consistent this
way, since some messages are complete sentences and some not.


Grammar
-------

Use the active voice.  Use complete sentences when there is an acting
subject ("A could not do B").  Use telegram style without subject if
the subject would be the program itself; do not use "I" for the
program.

RATIONALE: The program is not human.  Don't pretend otherwise.

Instead of multiple sentences, consider using semicolons or commas.

RATIONALE: This avoids peculiar punctuation if you follow the request
to leave off the final period.


Present vs past tense
---------------------

There is a nontrivial semantic difference between sentences of the
form

| could not open file "%s"

and

| cannot open file "%s"

The first one means that the attempt to open the file failed.  The
message should give a reason, such as "disk full" or "file doesn't
exist".  The past tense is appropriate because next time the disk
might not be full anymore or the file in question may exist.

The second form indicates the the functionality of opening the named
file does not exist at all in the program, or that it's conceptually
impossible.  The present tense is appropriate because the condition
will persist indefinitely.

RATIONALE: Granted, the average user will not be able to draw great
conclusions merely from the tense of the message, but since the
language provides us with a grammar we should use it correctly.


Type of the object
------------------

When citing the name of an object, state what kind of object it is.

RATIONALE:  Else no one will know what "foo.bar.baaz" ist.


Brackets
--------

Brackets are only to be used in command synopses to denote optional
arguments, or to denote an array subscript.

RATIONALE: Anything else does not correspond to widely-known customary
usage and will confuse people.


Parentheses
-----------

Parentheses can be used to separate subsentences when they are
generated elsewhere.  For example:

| could not open file %s (%m)

RATIONALE: It would be difficult to account for all possible error codes
to paste this into a single smooth sentence.  It also looks better and is
more flexible than colons or dashes to separate the sentences


Reasons for errors
------------------

Messages should always state the reason for why an error occurred.
For example:

BAD: could not open file %s
BETTER: could not open file %s (I/O failure)

If the reason is not known you better fix the code. ;-)


Tricky words to avoid
---------------------

unable:

"unable" is nearly the passive voice.  Better use "cannot" or "could
not", as appropriate.

bad:

Error messages like "bad result" are really hard to interpret
intelligently.  It's better to write why the result is "bad", e.g.,
"invalid format".

illegal:

"Illegal" stands for a violation of the law, the rest is "invalid".
Better yet, say why it's invalid.

unknown:

Try to avoid "unknown".  Consider, "error: unknown response".  If you
don't know what the response is, how do you know it's erroneous?  If,
however, the error lies in the fact that you don't know the response,
this wording is clearly confusing.


Function names
--------------

Rather than mentioning what the function or system call was that
failed, describe what the function was trying to do, e.g., "could not
open file".  This may admittedly be difficult to do with candidates
such as "select()".

RATIONALE: Users don't know what all those functions do.


Find vs Exists
--------------

If the program uses a nontrivial algorithm to locate a resource (e.g.,
a path search) and that algorithm fails, it is fair to say that the
program couldn't "find" the resource.  If, on the other hand, the
location of the resource is known and the program cannot locate it
then just say that the resource doesn't "exist".  Using "find" in this
case sounds weak and confuses the issue.


Proper spelling
---------------

Spell out words in full.  For instance, avoid:

spec
stats
parens
auth
xact

RATIONALE: This will improve consistency.

-- 
Peter Eisentraut   peter_e@gmx.net



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [INTERFACES] Upgrading the backend's error-message infrastructure
Next
From: Tom Lane
Date:
Subject: Re: Roadmap for FE/BE protocol redesign