Re: Internationalized error messages - Mailing list pgsql-hackers

From ncm@zembu.com (Nathan Myers)
Subject Re: Internationalized error messages
Date
Msg-id 20010308193041.Z624@store.zembu.com
Whole thread Raw
In response to Re: Internationalized error messages  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Mar 08, 2001 at 09:00:09PM -0500, Tom Lane wrote:
> ncm@zembu.com (Nathan Myers) writes:
> > Similar approaches have been tried frequently, and even enshrined 
> > in standards (e.g. POSIX catgets), but have almost always proven too
> > cumbersome.  The problem is that keeping programs that interpret the 
> > numeric code in sync with the program they monitor is hard, and trying 
> > to avoid breaking all those secondary programs hinders development on 
> > the primary program.  Furthermore, assigning code numbers is a nuisance,
> > and they add uninformative clutter.  
> 
> There's a difficult tradeoff to make here, but I think we do want to
> distinguish between the "official error code" --- the thing that has
> translations into various languages --- and what the backend is actually
> allowed to print out.  It seems to me that a fairly large fraction of
> the unique messages found in the backend can all be lumped under the
> category of "internal error", and that we need to have only one official
> error code and one user-level translated message for the lot of them.
> But we do want to be able to print out different detail messages for
> each of those internal errors.  There are other categories that might be
> lumped together, but that one alone is sufficiently large to force us
> to recognize it.  This suggests a distinction between a "primary" or
> "user-level" error message, which we catalog and provide translations
> for, and a "secondary", "detail", or "wizard-level" error message that
> exists only in the backend source code, and only in English, and so
> can be made up on the spur of the moment.

I suggest using different named functions/macros for different 
categories of message, rather than arguments to a common function.  
(I.e. "elog(ERROR, ...)" Considered Harmful.)  

You might even have more than one call at a site, one for the official
message and another for unofficial or unstable informative details.

> Another thing that I missed in Peter's proposal is how we are going to
> cope with messages that include parameters.  Surely we do not expect
> gettext to start with 'Attribute "foo" not found' and distinguish fixed
> from variable parts of that string?

The common way to deal with this is to catalog the format string itself,
with its embedded % directives.  The tricky bit, and what the printf 
family has had to be extended to handle, is that the order of the formal 
arguments varies with the target language.  The original string is an 
ordinary printf string, but the translations may have to refer to the 
substitution arguments by numeric position (as well as type).

There is probably Free code to implement that.

As much as possible, any compile-time annotations should be extracted 
into the catalog and filtered out of the source, to be reunited only
when you retrieve the catalog entry.  


> So it's clear that we need to devise a way of breaking an "error
> message" into multiple portions, including:
> 
>     Primary error message (localizable)
>     Parameters to insert into error message (user identifiers, etc)
>     Secondary (wizard) error message (optional)
>     Source code location
>     Query text location (optional)
> 
> and perhaps others that I have forgotten about.  One of the key things
> to think about is whether we can, or should try to, transmit all this
> stuff in a backwards-compatible protocol.  That would mean we'd have
> to dump all the info into a single string, which is doable but would
> perhaps look pretty ugly:
> 
>     ERROR: Attribute "foo" not found  -- basic message for dumb frontends
>     ERRORCODE: UNREC_IDENT        -- key for finding localized message
>     PARAM1: foo    -- something to embed in the localized message
>     MESSAGE: Attribute or table name not known within context of query
>     CODELOC: src/backend/parser/parse_clause.c line 345
>     QUERYLOC: 22

Whitespace can be used effectively.  E.g. only primary messages appear
in column 0.  PG might emit this, which is easily filtered:
  Attribute "foo" not found   severity: cannot proceed   explain: An attribute or table was name not known within
explain:the context of the query.   index: 237 Attribute \"%s\" not found   location: src/backend/parser/parse_clause.c
line345   query_position: 22
 

Here the first line is the localized replacement of what appears in the 
code, with arguments substituted in.   The other stuff comes from the
catalog

The call looks like
 elog_query("Attribute \"%s\" not found", foo); elog_explain("An attribute or table was name not known within"
   "the context of the query."); elog_severity(ERROR);
 

which might gets expanded (squeezed) by the preprocessor to
 _elog(current_query_position, "Attribute \"%s\" not found", foo);

while a separate tool scans the sources and builds the catalog,
annotating it with severity, line number, etc.  Human translators
may edit copies of the resulting catalog.  The call to _elog looks up
the string in the catalog, substitutes arguments into the translation,
and emits it along with the catalog index number and whatever else
has been requested in the config file.  Alternatively, any other program 
can use the number to pull the annotations out of the catalog given
just the index.

> Alternatively we could suppress most of this stuff unless the frontend
> specifically asks for it (and presumably is willing to digest it for
> the user).
> 
> Bottom line for me is that if we are going to go to the trouble of
> examining and changing every single elog() in the system, we should
> try to get all of these issues cleaned up at once.  Let's not have to
> go back and do it again later.

The more complex it is, the more likely that will need to be redone.
The simpler the calls look, the more likely that you can automate
(or implement invisibly) any later improvements.  

Nathan Myers
ncm@zembu.com


pgsql-hackers by date:

Previous
From: Hiroshi Inoue
Date:
Subject: Re: How to handle waitingForLock in LockWaitCancel()
Next
From: Gunnar R|nning
Date:
Subject: Re: Performance monitor