Thread: Internationalized error messages

Internationalized error messages

From
Peter Eisentraut
Date:
I really feel that translated error messages need to happen soon.
Managing translated message catalogs can be done easily with available
APIs.  However, translatable messages really require an error code
mechanism (otherwise it's completely impossible for programs to interpret
error messages reliably).  I've been thinking about this for much too long
now and today I finally settled to the simplest possible solution.

Let the actual method of allocating error codes be irrelevant for now,
although the ones in the SQL standard are certainly to be considered for a
start.  Essentially, instead of writing
   elog(ERROR, "disaster struck");

you'd write
   elog(ERROR, "XYZ01", "disaster struck");

Now you'll notice that this approach doesn't make the error message text
functionally dependend on the error code.  The alternative would have been
to write
   elog(ERROR, "XYZ01");

which makes the code much less clear.  Additonally, most of the elog()
calls use printf style variable argument lists.  So maybe
   elog(ERROR, "XYZ01", (arg + 1), foo);

This is not only totally obscure, but also incredibly cumbersome to
maintain and very error prone.  One earlier idea was to make the "XYZ01"
thing a macro instead that expands to a string with % arguments, that GCC
can check as it does now.  But I don't consider this a lot better, because
the initial coding is still obscured, and additonally the list of those
macros needs to be maintained.  (The actual error codes might still be
provided as readable macro names similar to the errno codes, but I'm not
sure if we should share these between server and client.)

Finally, there might also be legitimate reasons to have different error
message texts for the same error code.  For example, "type errors" (don't
know if this is an official code) can occur in a number of places that
might warrant different explanations.  Indeed, this approach would
preserve "artistic freedom" to some extent while still maintaining some
structure alongside.  And it would be rather straightforward to implement,
too.  Those who are too bored to assign error codes to new code can simply
pick some "zero" code as default.

On the protocol front, this could be pretty easy to do.  Instead of
"message text" we'd send a string "XYZ01: message text".  Worst case, we
pass this unfiltered to the client and provide an extra function that
returns only the first five characters.  Alternatively we could strip off
the prefix when returning the message text only.

At the end, the i18n part would actually be pretty easy, e.g.,
   elog(ERROR, "XYZ01", gettext("stuff happened"));


Comments?  Better ideas?

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Internationalized error messages

From
Ian Lance Taylor
Date:
Peter Eisentraut <peter_e@gmx.net> writes:

> Let the actual method of allocating error codes be irrelevant for now,
> although the ones in the SQL standard are certainly to be considered for a
> start.  Essentially, instead of writing
> 
>     elog(ERROR, "disaster struck");
> 
> you'd write
> 
>     elog(ERROR, "XYZ01", "disaster struck");

I like this approach.  One of the nice things about Oracle is that
they have an error manual.  All Oracle errors have an associated
number.  You can look up that number in the error manual to find a
paragraph giving details and workarounds.  Admittedly, sometimes the
further details are not helpful, but sometimes they are.  The basic
idea of being able to look up an error lets programmers balance the
need for a terse error message with the need for a fuller explanation.

Ian

---------------------------(end of broadcast)---------------------------
TIP 32: I just know I'm a better manager when I have Joe DiMaggio in center field.    -- Casey Stengel


Re: Internationalized error messages

From
ncm@zembu.com (Nathan Myers)
Date:
On Thu, Mar 08, 2001 at 11:49:50PM +0100, Peter Eisentraut wrote:
> I really feel that translated error messages need to happen soon.
> Managing translated message catalogs can be done easily with available
> APIs.  However, translatable messages really require an error code
> mechanism (otherwise it's completely impossible for programs to interpret
> error messages reliably).  I've been thinking about this for much too long
> now and today I finally settled to the simplest possible solution.
> 
> Let the actual method of allocating error codes be irrelevant for now,
> although the ones in the SQL standard are certainly to be considered for a
> start.  Essentially, instead of writing
> 
>     elog(ERROR, "disaster struck");
> 
> you'd write
> 
>     elog(ERROR, "XYZ01", "disaster struck");
> 
> Now you'll notice that this approach doesn't make the error message text
> functionally dependend on the error code.  The alternative would have been
> to write
> 
>     elog(ERROR, "XYZ01");
> 
> which makes the code much less clear.  Additonally, most of the elog()
> calls use printf style variable argument lists.  So maybe
> 
>     elog(ERROR, "XYZ01", (arg + 1), foo);
> 
> This is not only totally obscure, but also incredibly cumbersome to
> maintain and very error prone.  One earlier idea was to make the "XYZ01"
> thing a macro instead that expands to a string with % arguments, that GCC
> can check as it does now.  But I don't consider this a lot better, because
> the initial coding is still obscured, and additonally the list of those
> macros needs to be maintained.  (The actual error codes might still be
> provided as readable macro names similar to the errno codes, but I'm not
> sure if we should share these between server and client.)
> 
> Finally, there might also be legitimate reasons to have different error
> message texts for the same error code.  For example, "type errors" (don't
> know if this is an official code) can occur in a number of places that
> might warrant different explanations.  Indeed, this approach would
> preserve "artistic freedom" to some extent while still maintaining some
> structure alongside.  And it would be rather straightforward to implement,
> too.  Those who are too bored to assign error codes to new code can simply
> pick some "zero" code as default.
> 
> On the protocol front, this could be pretty easy to do.  Instead of
> "message text" we'd send a string "XYZ01: message text".  Worst case, we
> pass this unfiltered to the client and provide an extra function that
> returns only the first five characters.  Alternatively we could strip off
> the prefix when returning the message text only.
> 
> At the end, the i18n part would actually be pretty easy, e.g.,
> 
>     elog(ERROR, "XYZ01", gettext("stuff happened"));

Similar approaches have been tried frequently, and even enshrined 
in standards (e.g. POSIX catgets), but have almost always proven too
cumbersome.  The problem is that keeping programs that interpret the 
numeric code in sync with the program they monitor is hard, and trying 
to avoid breaking all those secondary programs hinders development on 
the primary program.  Furthermore, assigning code numbers is a nuisance,
and they add uninformative clutter.  

It's better to scan the program for elog() arguments, and generate
a catalog by using the string itself as the index code.  Those 
maintaining the secondary programs can compare catalogs to see what 
has been broken by changes and what new messages to expect.  elog()
itself can (optionally) invent tokens (e.g. catalog indices) to help 
out those programs.

Nathan Myers
ncm@zembu.com


Re: Internationalized error messages

From
Tom Lane
Date:
> On Thu, Mar 08, 2001 at 11:49:50PM +0100, Peter Eisentraut wrote:
>> I really feel that translated error messages need to happen soon.

Agreed.

ncm@zembu.com (Nathan Myers) writes:
> Similar approaches have been tried frequently, and even enshrined 
> in standards (e.g. POSIX catgets), but have almost always proven too
> cumbersome.  The problem is that keeping programs that interpret the 
> numeric code in sync with the program they monitor is hard, and trying 
> to avoid breaking all those secondary programs hinders development on 
> the primary program.  Furthermore, assigning code numbers is a nuisance,
> and they add uninformative clutter.  

There's a difficult tradeoff to make here, but I think we do want to
distinguish between the "official error code" --- the thing that has
translations into various languages --- and what the backend is actually
allowed to print out.  It seems to me that a fairly large fraction of
the unique messages found in the backend can all be lumped under the
category of "internal error", and that we need to have only one official
error code and one user-level translated message for the lot of them.
But we do want to be able to print out different detail messages for
each of those internal errors.  There are other categories that might be
lumped together, but that one alone is sufficiently large to force us
to recognize it.  This suggests a distinction between a "primary" or
"user-level" error message, which we catalog and provide translations
for, and a "secondary", "detail", or "wizard-level" error message that
exists only in the backend source code, and only in English, and so
can be made up on the spur of the moment.

Another thing that's bothered me for a long time is our inconsistent
approach to determining where in the code a message comes from.  A lot
of the messages currently embed the name of the generating routine right
into the error text.  Again, we ought to separate the functionality:
the source-code location is valuable but ought not form part of the
primary error message.  I would like to see elog() become a macro that
invokes __FILE__ and __LINE__ to automatically make the *exact* source
code location become part of the secondary error information, and then
drop the convention of using the routine name in the message text.

Something else we have talked about off-and-on is providing locator
information for errors that can be associated with a particular point in
the query string (lexical and syntactic errors).  This would probably be
best returned as a character index.

Another thing that I missed in Peter's proposal is how we are going to
cope with messages that include parameters.  Surely we do not expect
gettext to start with 'Attribute "foo" not found' and distinguish fixed
from variable parts of that string?

So it's clear that we need to devise a way of breaking an "error
message" into multiple portions, including:
Primary error message (localizable)Parameters to insert into error message (user identifiers, etc)Secondary (wizard)
errormessage (optional)Source code locationQuery text location (optional)
 

and perhaps others that I have forgotten about.  One of the key things
to think about is whether we can, or should try to, transmit all this
stuff in a backwards-compatible protocol.  That would mean we'd have
to dump all the info into a single string, which is doable but would
perhaps look pretty ugly:
ERROR: Attribute "foo" not found  -- basic message for dumb frontendsERRORCODE: UNREC_IDENT        -- key for finding
localizedmessagePARAM1: foo    -- something to embed in the localized messageMESSAGE: Attribute or table name not known
withincontext of queryCODELOC: src/backend/parser/parse_clause.c line 345QUERYLOC: 22
 

Alternatively we could suppress most of this stuff unless the frontend
specifically asks for it (and presumably is willing to digest it for
the user).

Bottom line for me is that if we are going to go to the trouble of
examining and changing every single elog() in the system, we should
try to get all of these issues cleaned up at once.  Let's not have to
go back and do it again later.
        regards, tom lane


Re: Internationalized error messages

From
ncm@zembu.com (Nathan Myers)
Date:
On Thu, Mar 08, 2001 at 09:00:09PM -0500, Tom Lane wrote:
> ncm@zembu.com (Nathan Myers) writes:
> > Similar approaches have been tried frequently, and even enshrined 
> > in standards (e.g. POSIX catgets), but have almost always proven too
> > cumbersome.  The problem is that keeping programs that interpret the 
> > numeric code in sync with the program they monitor is hard, and trying 
> > to avoid breaking all those secondary programs hinders development on 
> > the primary program.  Furthermore, assigning code numbers is a nuisance,
> > and they add uninformative clutter.  
> 
> There's a difficult tradeoff to make here, but I think we do want to
> distinguish between the "official error code" --- the thing that has
> translations into various languages --- and what the backend is actually
> allowed to print out.  It seems to me that a fairly large fraction of
> the unique messages found in the backend can all be lumped under the
> category of "internal error", and that we need to have only one official
> error code and one user-level translated message for the lot of them.
> But we do want to be able to print out different detail messages for
> each of those internal errors.  There are other categories that might be
> lumped together, but that one alone is sufficiently large to force us
> to recognize it.  This suggests a distinction between a "primary" or
> "user-level" error message, which we catalog and provide translations
> for, and a "secondary", "detail", or "wizard-level" error message that
> exists only in the backend source code, and only in English, and so
> can be made up on the spur of the moment.

I suggest using different named functions/macros for different 
categories of message, rather than arguments to a common function.  
(I.e. "elog(ERROR, ...)" Considered Harmful.)  

You might even have more than one call at a site, one for the official
message and another for unofficial or unstable informative details.

> Another thing that I missed in Peter's proposal is how we are going to
> cope with messages that include parameters.  Surely we do not expect
> gettext to start with 'Attribute "foo" not found' and distinguish fixed
> from variable parts of that string?

The common way to deal with this is to catalog the format string itself,
with its embedded % directives.  The tricky bit, and what the printf 
family has had to be extended to handle, is that the order of the formal 
arguments varies with the target language.  The original string is an 
ordinary printf string, but the translations may have to refer to the 
substitution arguments by numeric position (as well as type).

There is probably Free code to implement that.

As much as possible, any compile-time annotations should be extracted 
into the catalog and filtered out of the source, to be reunited only
when you retrieve the catalog entry.  


> So it's clear that we need to devise a way of breaking an "error
> message" into multiple portions, including:
> 
>     Primary error message (localizable)
>     Parameters to insert into error message (user identifiers, etc)
>     Secondary (wizard) error message (optional)
>     Source code location
>     Query text location (optional)
> 
> and perhaps others that I have forgotten about.  One of the key things
> to think about is whether we can, or should try to, transmit all this
> stuff in a backwards-compatible protocol.  That would mean we'd have
> to dump all the info into a single string, which is doable but would
> perhaps look pretty ugly:
> 
>     ERROR: Attribute "foo" not found  -- basic message for dumb frontends
>     ERRORCODE: UNREC_IDENT        -- key for finding localized message
>     PARAM1: foo    -- something to embed in the localized message
>     MESSAGE: Attribute or table name not known within context of query
>     CODELOC: src/backend/parser/parse_clause.c line 345
>     QUERYLOC: 22

Whitespace can be used effectively.  E.g. only primary messages appear
in column 0.  PG might emit this, which is easily filtered:
  Attribute "foo" not found   severity: cannot proceed   explain: An attribute or table was name not known within
explain:the context of the query.   index: 237 Attribute \"%s\" not found   location: src/backend/parser/parse_clause.c
line345   query_position: 22
 

Here the first line is the localized replacement of what appears in the 
code, with arguments substituted in.   The other stuff comes from the
catalog

The call looks like
 elog_query("Attribute \"%s\" not found", foo); elog_explain("An attribute or table was name not known within"
   "the context of the query."); elog_severity(ERROR);
 

which might gets expanded (squeezed) by the preprocessor to
 _elog(current_query_position, "Attribute \"%s\" not found", foo);

while a separate tool scans the sources and builds the catalog,
annotating it with severity, line number, etc.  Human translators
may edit copies of the resulting catalog.  The call to _elog looks up
the string in the catalog, substitutes arguments into the translation,
and emits it along with the catalog index number and whatever else
has been requested in the config file.  Alternatively, any other program 
can use the number to pull the annotations out of the catalog given
just the index.

> Alternatively we could suppress most of this stuff unless the frontend
> specifically asks for it (and presumably is willing to digest it for
> the user).
> 
> Bottom line for me is that if we are going to go to the trouble of
> examining and changing every single elog() in the system, we should
> try to get all of these issues cleaned up at once.  Let's not have to
> go back and do it again later.

The more complex it is, the more likely that will need to be redone.
The simpler the calls look, the more likely that you can automate
(or implement invisibly) any later improvements.  

Nathan Myers
ncm@zembu.com


Re: Internationalized error messages

From
Denis Perchine
Date:
> I like this approach.  One of the nice things about Oracle is that
> they have an error manual.  All Oracle errors have an associated
> number.  You can look up that number in the error manual to find a
> paragraph giving details and workarounds.  Admittedly, sometimes the
> further details are not helpful, but sometimes they are.  The basic
> idea of being able to look up an error lets programmers balance the
> need for a terse error message with the need for a fuller explanation.

One of the examples when you need exact error message code is when you want 
to separate unique index violations from other errors. This often needed when 
you want just do insert, and leave all constraint checking to database...

-- 
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------


Re: Internationalized error messages

From
Karel Zak
Date:
On Thu, Mar 08, 2001 at 09:00:09PM -0500, Tom Lane wrote:
> > On Thu, Mar 08, 2001 at 11:49:50PM +0100, Peter Eisentraut wrote:
> >> I really feel that translated error messages need to happen soon.
> 
> Agreed.
Yes, error codes is *very* wanted feature.

> 
>     ERROR: Attribute "foo" not found  -- basic message for dumb frontends
>     ERRORCODE: UNREC_IDENT        -- key for finding localized message
>     PARAM1: foo    -- something to embed in the localized message
>     MESSAGE: Attribute or table name not known within context of query
>     CODELOC: src/backend/parser/parse_clause.c line 345
>     QUERYLOC: 22
Great idea! I agree that we need some powerful Error protocol instead 
currect string based messages.For transaltion to other languages I not sure with gettext() stuff on
backend -- IMHO better (faster) solution will postgres system catalog
with it.
May be add new command too: SET MESSAGE_LANGUAGE TO <xxx>, because
wanted language not must be always same as locale setting.
Something like elog(ERROR, gettext(...)); is usable, but not sounds good 
for me.
        Karel

-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz,
http://mape.jcu.cz


Re: Internationalized error messages

From
Tatsuo Ishii
Date:
>  For transaltion to other languages I not sure with gettext() stuff on
> backend -- IMHO better (faster) solution will postgres system catalog
> with it.
> 
>  May be add new command too: SET MESSAGE_LANGUAGE TO <xxx>, because
> wanted language not must be always same as locale setting.

In the multibyte enabled environment, that kind of command would not
be necessary except UNICODE and MULE_INTERNAL, since they are
multi-lingual encoding. For them, we might need something like:

SET LANGUAGE_PREFERENCE TO 'Japanese';

For the long term solutuon, this kind of problem should be solved in
the implemetaion of SQL-92/99 i18n features.
--
Tatsuo Ishii


Re: Internationalized error messages

From
Peter Eisentraut
Date:
Nathan Myers writes:

> >     elog(ERROR, "XYZ01", gettext("stuff happened"));
>
> Similar approaches have been tried frequently, and even enshrined
> in standards (e.g. POSIX catgets), but have almost always proven too
> cumbersome.  The problem is that keeping programs that interpret the
> numeric code in sync with the program they monitor is hard, and trying
> to avoid breaking all those secondary programs hinders development on
> the primary program.

That's why no one uses catgets and everyone uses gettext.

> Furthermore, assigning code numbers is a nuisance, and they add
> uninformative clutter.

The error codes are exactly what we want, to allow client programs (as
opposed to humans) to identify the errors.  The code in my example has
nothing to do with the message id in the catgets interface.

> It's better to scan the program for elog() arguments, and generate
> a catalog by using the string itself as the index code.  Those
> maintaining the secondary programs can compare catalogs to see what
> has been broken by changes and what new messages to expect.  elog()
> itself can (optionally) invent tokens (e.g. catalog indices) to help
> out those programs.

That's what gettext does for you.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Internationalized error messages

From
Peter Eisentraut
Date:
Tom Lane writes:

> There's a difficult tradeoff to make here, but I think we do want to
> distinguish between the "official error code" --- the thing that has
> translations into various languages --- and what the backend is actually
> allowed to print out.  It seems to me that a fairly large fraction of
> the unique messages found in the backend can all be lumped under the
> category of "internal error", and that we need to have only one official
> error code and one user-level translated message for the lot of them.

That's exactly what I was trying to avoid.  You'd still be allowed to
choose the error message text freely, but client programs will be able to
make sense of them by looking at the code only, as opposed to parsing the
message text.  I'm trying to avoid making the message text to be computed
from the error code, because that obscures the source code.

> Another thing that's bothered me for a long time is our inconsistent
> approach to determining where in the code a message comes from.  A lot
> of the messages currently embed the name of the generating routine right
> into the error text.  Again, we ought to separate the functionality:
> the source-code location is valuable but ought not form part of the
> primary error message.  I would like to see elog() become a macro that
> invokes __FILE__ and __LINE__ to automatically make the *exact* source
> code location become part of the secondary error information, and then
> drop the convention of using the routine name in the message text.

These sort of things have been on my mind as well, but they're really
independent of my issue.  We can easily have runtime options to append or
not additional things to the error string.  I don't see this as part of my
proposal.

> Another thing that I missed in Peter's proposal is how we are going to
> cope with messages that include parameters.  Surely we do not expect
> gettext to start with 'Attribute "foo" not found' and distinguish fixed
> >from variable parts of that string?

Sure we do.

> That would mean we'd have to dump all the info into a single string,
> which is doable but would perhaps look pretty ugly:
>
>     ERROR: Attribute "foo" not found  -- basic message for dumb frontends
>     ERRORCODE: UNREC_IDENT        -- key for finding localized message

There should not be a "key" to look up localized messages.  Remember that
the localization will also have to be done in all the front-end programs.
Surely we do not wish to make a list of messages that pg_dump or psql
print out.  Gettext takes care of this stuff.  The only reason why we need
error codes is for the sake of ease of interpreting by programs.

>     PARAM1: foo    -- something to embed in the localized message

Not necessary.

>     MESSAGE: Attribute or table name not known within context of query

How's that different from ERROR:?

>     CODELOC: src/backend/parser/parse_clause.c line 345

Can be appended to ERROR (or MESSAGE) depending on configuration setting.

>     QUERYLOC: 22

Not all errors are related to a query.

The general problem here is also that this would introduce a client
incompatibility.  Older clients that do not expect this amount of detail
will print all this garbage to the screen?

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Internationalized error messages

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> That's exactly what I was trying to avoid.  You'd still be allowed to
> choose the error message text freely, but client programs will be able to
> make sense of them by looking at the code only, as opposed to parsing the
> message text.  I'm trying to avoid making the message text to be computed
> from the error code, because that obscures the source code.

I guess I don't understand what you have in mind, because this seems
self-contradictory.  If "client programs can look at the code only",
then how can the error message text be chosen independently of the code?

>> Surely we do not expect gettext to start with 'Attribute "foo" not
>> found' and distinguish fixed from variable parts of that string?

> Sure we do.

How does that work exactly?  You're assuming an extremely intelligent
localization mechanism, I guess, which I was not.  I think it makes more
sense to work a little harder in the backend to avoid requiring AI
software in every frontend.

>> MESSAGE: Attribute or table name not known within context of query

> How's that different from ERROR:?

Sorry, I meant that as an example of the "secondary message string", but
it's a pretty lame example...

> The general problem here is also that this would introduce a client
> incompatibility.  Older clients that do not expect this amount of detail
> will print all this garbage to the screen?

Yes, if we send it to them.  It would make sense to control the amount
of detail presented via some option (a GUC variable, probably).  For
backwards compatibility reasons we'd want the default to correspond to
roughly the existing amount of detail.
        regards, tom lane


Re: Internationalized error messages

From
Peter Eisentraut
Date:
Tom Lane writes:

> I guess I don't understand what you have in mind, because this seems
> self-contradictory.  If "client programs can look at the code only",
> then how can the error message text be chosen independently of the code?

Let's say "type mismatch error", code 2200G acc. to SQL.  At one place in
the source you write

elog(ERROR, "2200G", "type mismatch in CASE expression (%s vs %s)", ...);

Elsewhere you'd write

elog(ERROR, "2200G", "type mismatch in argument %d of function %s,    expected %s, got %s", ...);

Humans can look at this and have a fairly good idea what they'd need to
fix.  However, a client program currently only has the option of failing
or not failing.  In this example case it would probably better for it to
fail, but someone else already put forth the example of constraint
violation.  In this case the program might want to do something else.

> >> Surely we do not expect gettext to start with 'Attribute "foo" not
> >> found' and distinguish fixed from variable parts of that string?
>
> > Sure we do.
>
> How does that work exactly?  You're assuming an extremely intelligent
> localization mechanism, I guess, which I was not.  I think it makes more
> sense to work a little harder in the backend to avoid requiring AI
> software in every frontend.

Gettext takes care of this.  In the source you'd write

elog(ERROR, "2200G", gettext("type mismatch in CASE expression (%s vs %s)"),           string, string);

When you run the xgettext utility program it scans the source for cases of
gettext(...) and creates message catalogs for the translators.  When it
finds printf arguments it automatically includes marks in the message,
such as

"type mismatch in CASE expression (%1$s vs %2$s)"

which the translator better keep in his version.  This also handles the
case where the arguments might have to appear in a different order in a
different language.

> Sorry, I meant that as an example of the "secondary message string", but
> it's a pretty lame example...

I guess I'm not sold on the concept of primary and secondary message
strings.  If the primary message isn't good enough you better fix that.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Internationalized error messages

From
Peter Eisentraut
Date:
Karel Zak writes:

>  For transaltion to other languages I not sure with gettext() stuff on
> backend -- IMHO better (faster) solution will postgres system catalog
> with it.

elog(ERROR, "cannot open message catalog table");

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Internationalized error messages

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Let's say "type mismatch error", code 2200G acc. to SQL.  At one place in
> the source you write
> elog(ERROR, "2200G", "type mismatch in CASE expression (%s vs %s)", ...);
> Elsewhere you'd write
> elog(ERROR, "2200G", "type mismatch in argument %d of function %s,
>      expected %s, got %s", ...);

Okay, so your notion of an error code is not a localizable entity at
all, it's something for client programs to look at.  Now I get it.

I object to writing "2200G" however, because that has no mnemonic value
whatever, and is much too easy to get wrong.  How about

elog(ERROR, ERR_TYPE_MISMATCH, "type mismatch in argument %d of function %s,    expected %s, got %s", ...);

where ERR_TYPE_MISMATCH is #defined as "2200G" someplace?  Or for that
matter #defined as "TYPE_MISMATCH"?  Content-free numeric codes are no
fun to use on the client side either...

> Gettext takes care of this.  In the source you'd write

> elog(ERROR, "2200G", gettext("type mismatch in CASE expression (%s vs %s)"),
>             string, string);

Duh.  For some reason I was envisioning the localization substitution as
occurring on the client side, but of course we'd want to do it on the
server side, and before parameters are substituted into the message.
Sorry for the noise.

I am not sure we can/should use gettext (possible license problems?),
but certainly something like this could be cooked up.

>> Sorry, I meant that as an example of the "secondary message string", but
>> it's a pretty lame example...

> I guess I'm not sold on the concept of primary and secondary message
> strings.  If the primary message isn't good enough you better fix that.

The motivation isn't so much to improve on the primary message as to
reduce the number of distinct strings that really need to be translated.
Remember all those internal "can't happen" errors.  If we have only one
message component then the translator is faced with a huge pile of
internal messages and not a lot of gain from translating them.  If
there's a primary and secondary component then all the internal messages
can share the same primary component ("Internal error, please file a bug
report").  Now the translator translates that one message, and can
ignore the many secondary-component messages with a clear conscience.
(Of course, he can translate those too if he really wants to, but the
point is that he doesn't *have* to do it to attain reasonably friendly
behavior.)

Perhaps another way to look at it is that we have a bunch of errors that
are user-oriented (ie, relate pretty directly to something the user did
wrong) and another bunch that are system-oriented (relate to internal
problems, such as consistency check failures or violations of internal
APIs).  We want to provide localized translations of the first set, for
sure.  I don't think we need localized translations of the second set,
so long as we have some sort of "covering message" that can be localized
for them.  Maybe instead of "primary" and "secondary" strings for a
single error, we ought to distinguish these two categories of error and
plan different localization strategies for them.
        regards, tom lane


Re: Internationalized error messages

From
Andrew Evans
Date:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > Let's say "type mismatch error", code 2200G acc. to SQL.  At one place in
> > the source you write
> > elog(ERROR, "2200G", "type mismatch in CASE expression (%s vs %s)", ...);

Tom Lane <tgl@sss.pgh.pa.us> spake:
> I object to writing "2200G" however, because that has no mnemonic value
> whatever, and is much too easy to get wrong.  How about
> 
> elog(ERROR, ERR_TYPE_MISMATCH, "type mismatch in argument %d of function %s,
>      expected %s, got %s", ...);
> 
> where ERR_TYPE_MISMATCH is #defined as "2200G" someplace?  Or for that
> matter #defined as "TYPE_MISMATCH"?  Content-free numeric codes are no
> fun to use on the client side either...

This is one thing I think VMS does well.  All error messages are a
composite of the subsystem where they originated, the severity of the
error, and the actual error itself.  Internally this is stored in a
32-bit word.  It's been a long time, so I don't recall how many bits
they allocated for each component.  The human-readable representation
looks like "<subsystem>-<severity>-<error>".

--
Andrew Evans


Re: Internationalized error messages

From
ncm@zembu.com (Nathan Myers)
Date:
On Fri, Mar 09, 2001 at 12:05:22PM -0500, Tom Lane wrote:
> > Gettext takes care of this.  In the source you'd write
> 
> > elog(ERROR, "2200G", gettext("type mismatch in CASE expression (%s vs %s)"),
> >             string, string);
> 
> Duh.  For some reason I was envisioning the localization substitution as
> occurring on the client side, but of course we'd want to do it on the
> server side, and before parameters are substituted into the message.
> Sorry for the noise.
> 
> I am not sure we can/should use gettext (possible license problems?),
> but certainly something like this could be cooked up.

I've been assuming that PG's needs are specialized enough that the
project wouldn't use gettext directly, but instead something inspired 
by it.  

If you look at my last posting on the subject, by the way, you will see 
that it could work without a catalog underneath; integrating a catalog 
would just require changes in a header file (and the programs to generate 
the catalog, of course).  That quality seems to me essential to allow the 
changeover to be phased in gradually, and to allow different underlying 
catalog implementations to be tried out.

Nathan
ncm


Re: Internationalized error messages

From
Peter Eisentraut
Date:
Tom Lane writes:

> I object to writing "2200G" however, because that has no mnemonic value
> whatever, and is much too easy to get wrong.  How about
>
> elog(ERROR, ERR_TYPE_MISMATCH, "type mismatch in argument %d of function %s,
>      expected %s, got %s", ...);
>
> where ERR_TYPE_MISMATCH is #defined as "2200G" someplace?  Or for that
> matter #defined as "TYPE_MISMATCH"?  Content-free numeric codes are no
> fun to use on the client side either...

Well, SQL defines these.  Do we want to make our own list?  However,
numeric codes also have the advantage that some hierarchy is possible.
E.g., the "22" in "2200G" is actually the category code "data exception".
Personally, I would stick to the SQL codes but make some readable macro
name for backend internal use.

> I am not sure we can/should use gettext (possible license problems?),

Gettext is an open standard, invented at Sun IIRC.  There is also an
independent implementation for BSDs in the works.  On GNU/Linux system
it's in the C library.  I don't see any license problems that way.  Is has
been used widely for free software and so far I haven't seen any real
alternative.

> but certainly something like this could be cooked up.

Well, I'm trying to avoid having to do the cooking. ;-)

> Perhaps another way to look at it is that we have a bunch of errors that
> are user-oriented (ie, relate pretty directly to something the user did
> wrong) and another bunch that are system-oriented (relate to internal
> problems, such as consistency check failures or violations of internal
> APIs).  We want to provide localized translations of the first set, for
> sure.  I don't think we need localized translations of the second set,
> so long as we have some sort of "covering message" that can be localized
> for them.

I'm sure this can be covered in some macro way.  A random idea:

elog(ERROR, INTERNAL_ERROR("text"), ...)

expands to

elog(ERROR, gettext("Internal error: %s"), ...)

OTOH, we should not yet make presumptions about what dedicated translators
can be capable of.  :-)

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Internationalized error messages

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Well, SQL defines these.  Do we want to make our own list?  However,
> numeric codes also have the advantage that some hierarchy is possible.
> E.g., the "22" in "2200G" is actually the category code "data exception".
> Personally, I would stick to the SQL codes but make some readable macro
> name for backend internal use.

We will probably find cases where we need codes not defined by SQL
(since we have non-SQL features).  If there is room to invent our
own codes then I have no objection to this.

>> I am not sure we can/should use gettext (possible license problems?),

> Gettext is an open standard, invented at Sun IIRC.  There is also an
> independent implementation for BSDs in the works.  On GNU/Linux system
> it's in the C library.  I don't see any license problems that way.

Unless that BSD implementation is ready to go, I think we'd be talking
about relying on GPL'd (not LGPL'd) code for an essential component of
the system functionality.  Given RMS' recent antics I am much less
comfortable with that than I might once have been.
        regards, tom lane


Re: Internationalized error messages

From
Karel Zak
Date:
On Fri, Mar 09, 2001 at 05:57:13PM +0100, Peter Eisentraut wrote:
> Karel Zak writes:
> 
> >  For transaltion to other languages I not sure with gettext() stuff on
> > backend -- IMHO better (faster) solution will postgres system catalog
> > with it.
> 
> elog(ERROR, "cannot open message catalog table");
Sure, and what:

elog(ERROR, gettext("can't set LC_MESSAGES"));
We can generate our system catalog for this by simular way as gettext, it's 
means all messages can be in sources in English too.
But this is reflexion, performance test show more.
        Karel

-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz,
http://mape.jcu.cz


Re: Internationalized error messages

From
Peter Mount
Date:
At 23:49 08/03/01 +0100, Peter Eisentraut wrote:
>I really feel that translated error messages need to happen soon.
>Managing translated message catalogs can be done easily with available
>APIs.  However, translatable messages really require an error code
>mechanism (otherwise it's completely impossible for programs to interpret
>error messages reliably).  I've been thinking about this for much too long
>now and today I finally settled to the simplest possible solution.
>
>Let the actual method of allocating error codes be irrelevant for now,
>although the ones in the SQL standard are certainly to be considered for a
>start.  Essentially, instead of writing

snip

>On the protocol front, this could be pretty easy to do.  Instead of
>"message text" we'd send a string "XYZ01: message text".  Worst case, we
>pass this unfiltered to the client and provide an extra function that
>returns only the first five characters.  Alternatively we could strip off
>the prefix when returning the message text only.

Most other DB's (I'm thinking of Oracle here) pass the code unfiltered to 
the client anyhow. Saying that, it's not impossible to get psql and other 
interactive clients to strip the error code anyhow.


>At the end, the i18n part would actually be pretty easy, e.g.,
>
>     elog(ERROR, "XYZ01", gettext("stuff happened"));
>
>
>Comments?  Better ideas?

A couple of ideas. One, if we have a master list of error codes, we need to 
have this in an independent format (ie not a .h file). However the other 
idea is to expand on the JDBC's errors.properties files. Being 
ascii/unicode, the format will work with just some extra code to implement 
them in C.

Brief description:
------------------------

The ResourceBundle's handle one language per file. From a base filename, 
each different language has a file based on:
        filename_la_ct.properties

where la is the ISO 2 character language, and ct is the ISO 2 character 
country code.

For example:

messages_en_GB.properties
messages_en_US.properties
messages_en.properties
messages_fr.properties
messages.properties

Now, here for the english locale for England it checks in this order: 
messages_en_GB.properties messages_en.properties messages.properties.

In each file, a message is of the format:

key=message, and each parameter passed into the message written like {1} 
{2} etc, so for example:

fathom=Unable to fathom update count {0}

Now apart from the base file (messages.properties in this case), the other 
files are optional, and an entry only needs to be in there if they are 
present in that language.

So, in french, fathom may be translated, but then again it may not (in JDBC 
it isn't). Then it's not included in the file. Any new messages can be 
added to the base language, but only included as and when they are translated.

Peter



Re: Internationalized error messages

From
Peter Eisentraut
Date:
Karel Zak writes:

> > >  For transaltion to other languages I not sure with gettext() stuff on
> > > backend -- IMHO better (faster) solution will postgres system catalog
> > > with it.
> >
> > elog(ERROR, "cannot open message catalog table");
>
>  Sure, and what:
>
> elog(ERROR, gettext("can't set LC_MESSAGES"));
>
>  We can generate our system catalog for this by simular way as gettext, it's
> means all messages can be in sources in English too.

When there is an error condition in the backend, the last thing you want
to do (and are allowed to do) is accessing tables.  Also keep in mind that
we want to internationalize other parts of the system as well, such as
pg_dump and psql.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Internationalized error messages

From
Karel Zak
Date:
On Mon, Mar 12, 2001 at 08:15:02PM +0100, Peter Eisentraut wrote:
> Karel Zak writes:
> 
> > > >  For transaltion to other languages I not sure with gettext() stuff on
> > > > backend -- IMHO better (faster) solution will postgres system catalog
> > > > with it.
> > >
> > > elog(ERROR, "cannot open message catalog table");
> >
> >  Sure, and what:
> >
> > elog(ERROR, gettext("can't set LC_MESSAGES"));
> >
> >  We can generate our system catalog for this by simular way as gettext, it's
> > means all messages can be in sources in English too.
> 
> When there is an error condition in the backend, the last thing you want
> to do (and are allowed to do) is accessing tables.  Also keep in mind that
> we want to internationalize other parts of the system as well, such as
> pg_dump and psql.
Agree, the pg_xxxx application are good adepts for POSIX locales, all my
previous notes are about backend error/notice messages, but forget it --
after implementation we will more judicious.

-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz,
http://mape.jcu.cz


Re: Internationalized error messages

From
Giles Lean
Date:
Tom Lane wrote:

> I am not sure we can/should use gettext (possible license problems?),
> but certainly something like this could be cooked up.

http://citrus.bsdclub.org/index-en.html

I'm not sure of the current status of the code.

Regards,

Giles



Re: Internationalized error messages

From
Patrick Welche
Date:
On Fri, Mar 09, 2001 at 03:48:33PM -0500, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > Well, SQL defines these.  Do we want to make our own list?  However,
> > numeric codes also have the advantage that some hierarchy is possible.
> > E.g., the "22" in "2200G" is actually the category code "data exception".
> > Personally, I would stick to the SQL codes but make some readable macro
> > name for backend internal use.
> 
> We will probably find cases where we need codes not defined by SQL
> (since we have non-SQL features).  If there is room to invent our
> own codes then I have no objection to this.
> 
> >> I am not sure we can/should use gettext (possible license problems?),
> 
> > Gettext is an open standard, invented at Sun IIRC.  There is also an
> > independent implementation for BSDs in the works.  On GNU/Linux system
> > it's in the C library.  I don't see any license problems that way.
> 
> Unless that BSD implementation is ready to go, I think we'd be talking
> about relying on GPL'd (not LGPL'd) code for an essential component of
> the system functionality.  Given RMS' recent antics I am much less
> comfortable with that than I might once have been.

cf. http://citrus.bsdclub.org/

and the libintl in NetBSD, at least NetBSD-current, works. The hard part
was eg convincing gmake's configure to use it as there are bits like

#if __USE_GNU_GETTEXT

rather than just checking for the existence of the functions (as well as
the internal symbol _nl_msg_cat_cntr).

So yes it's ready to go, but please don't use the same m4 in configure.in as
for GNU gettext.

Cheers,

Patrick