Thread: More on elog and error codes

More on elog and error codes

From
Peter Eisentraut
Date:
I've looked at the elog calls in the source, about 1700 in total (only
elog(ERROR)).  If we mapped these to the SQL error codes then we'd have
about two dozen calls with an assigned code and the rest being "other".
The way I estimate it (I didn't really look at *each* call, of course) is
that about 2/3 of the calls are internal panic calls ("cache lookup of %s
failed"), 1/6 are SQL-level problems, and the rest are operating system,
storage problems, "not implemented", misconfigurations, etc.

A problem that makes this quite hard to manage is that many errors can be
reported from several places, e.g., the parser, the executor, the access
method.  Some of these messages are probably not readily reproduceable
because they are caught elsewhere.

Consequentially, the most pragmatic approach to assigning error codes
might be to just pick some numbers and give them out gradually.  A
hierarchical subsystem+code might be useful, beyond that it really depends
on what we expect from error codes in the first place.  Does anyone have
good experiences from other products?

Essentially, I envision making up a new function, say "elogc", which has
   elogc(<level>, [<subsys>,?] <code>, message...)

where the code is some macro, the expansion of which is to be determined.
A call to "elogc" would also require a formalized message wording, adding
the error code to the documentation, which also requires having a fairly
good idea how the error can happen and how to handle it.  This could
perhaps even be automated to some extent.

All the calls that are not converted yet will be assigned a to the generic
"internal error" class; most of them will stay this way.


As for translations, I don't think we have to worry about this right now.
Assuming that we would use gettext or something similar, we can tell it
that all calls to elog (or "elogc" or whatever) contain translatable
strings, so we don't have to uglify it with gettext(...) or _(...)  calls
or what else.


So we need some good error numbering scheme.  Any ideas?

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: More on elog and error codes

From
Philip Warner
Date:
At 23:56 19/03/01 +0100, Peter Eisentraut wrote:
>
>Essentially, I envision making up a new function, say "elogc", which has
>
>    elogc(<level>, [<subsys>,?] <code>, message...)
>
>where the code is some macro, the expansion of which is to be determined.
>A call to "elogc" would also require a formalized message wording, adding
>the error code to the documentation, which also requires having a fairly
>good idea how the error can happen and how to handle it.  This could
>perhaps even be automated to some extent.
>
>All the calls that are not converted yet will be assigned a to the generic
>"internal error" class; most of them will stay this way.
>
...
>
>So we need some good error numbering scheme.  Any ideas?
>

FWIW, the VMS scheme has error numbers broken down to include system,
subsystem, error number & severity. These are maintained in an error
message source file. eg. the file system's 'file not found' error message
is something like:

FACILITY RMS (the file system)
...
SEVERITY WARNING
...
FILNFND "File %AS not found"
...

It's a while since I used VMS messages files regularly, this is at least
representative. It  has the drawback that severity is often tied to the
message, not the circumstance, but this is a problem only rarely.

In code, the messages are used as external symbols (probably in our case
representing pointers to C format strings). In making extensive use of such
a mnemonics, I never really needed to have full text messages. Once a set
of standards is in place for message abbreviations, the most people can
read the message codes. This would mean that:
   elogc(<level>, [<subsys>,?] <code>, message...)

becomes:
   elogc(<code> [, parameter...])

eg.
   "cache lookup of %s failed"

might be replaced by:
   elog(CACHELOOKUPFAIL, cacheItemThatFailed);

and    "internal error: %s"

becomes
   elog(INTERNAL, "could not find the VeryImportantThing");

Unlike VMS, it's probably a good idea to separate the severity from the
error code, since a  CACHELOOKUPFAIL in one place may be less significant
than another (eg. severity=debug).

I also think it's important that we get the source file and line number
somewhere in the message, and if we have these, we may not need the subsystem.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: More on elog and error codes

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> I also think it's important that we get the source file and line number
> somewhere in the message, and if we have these, we may not need the
> subsystem.

I agree that the subsystem concept is not necessary, except possibly as
a means of avoiding collisions in the error-symbol namespace, and for
that it would only be a naming convention (PGERR_subsys_IDENTIFIER).
We probably do not need it considering that we have much less than 1000
distinct error identifiers to assign, judging from Peter's survey.

We do need severity to be distinct from the error code ("internal
errors" are surely not all the same severity, even if we don't bother
to assign formal error codes to each one).

BTW, the symbols used in the source code do need to have a common prefix
(PGERR_CACHELOOKUPFAIL not CACHELOOKUPFAIL) to avoid namespace pollution
problems.  We blew this before with "DEBUG" and friends, let's learn
from that mistake.
        regards, tom lane


Re: More on elog and error codes

From
Thomas Lockhart
Date:
> So we need some good error numbering scheme.  Any ideas?

SQL9x specifies some error codes, with no particular numbering scheme
other than negative numbers indicate a problem afaicr.

Shouldn't we map to those where possible?
                       - Thomas


Re: More on elog and error codes

From
Gunnar R|nning
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:

> > So we need some good error numbering scheme.  Any ideas?
> 
> SQL9x specifies some error codes, with no particular numbering scheme
> other than negative numbers indicate a problem afaicr.
> 
> Shouldn't we map to those where possible?
> 

Good point, but I guess most of the errors produced are pgsql
specific. If I remember right Sybase had several different SQL types of error
mapped to one of the standard error codes. 

Also the JDBC API provides methods to look at the database dependent error
code and standard error code. I've found both useful when working with
Sybase. 

cheers, 
Gunnar


Re: More on elog and error codes

From
Peter Eisentraut
Date:
Philip Warner writes:

>     elog(CACHELOOKUPFAIL, cacheItemThatFailed);

The disadvantage of this approach, which I tried to explain in a previous
message, is that we might want to have different wordings for different
occurences of the same class of error.

Additionally, the whole idea behind having error *codes* is that the
client program can easily distinguish errors that it can handle specially.
Thus the codes should be numeric or some other short, fixed scheme.  In
the backend they could be replaced by macros.

Example:

#define PGERR_TYPE 1854

/* somewhere... */

elogc(ERROR, PGERR_TYPE, "type %s cannot be created because it already exists", ...)

/* elsewhere... */

elogc(ERROR, PGERR_TYPE, "type %s used as argument %d of function %s doesn't exist", ...)


In fact, this is my proposal.  The "1854" can be argued, but I like the
rest.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: More on elog and error codes

From
Christopher Sawtell
Date:
On Tue, 20 Mar 2001 10:56, you wrote:
> I've looked at the elog calls in the source, about 1700 in total (only

[ ... ]

> So we need some good error numbering scheme.  Any ideas?

Just that it might be a good idea to incorporate the  version / release 
details in some way so that when somebody on the list is squeaking about 
an error message it is obvious to the helper that the advice needed is to 
upgrade from the Cretatious Period version to a modern release, and have 
another go.

-- 
Sincerely etc.,
NAME       Christopher SawtellCELL PHONE 021 257 4451ICQ UIN    45863470EMAIL      csawtell @ xtra . co . nzCNOTES
ftp://ftp.funet.fi/pub/languages/C/tutorials/sawtell_C.tar.gz
-->> Please refrain from using HTML or WORD attachments in e-mails to me 
<<--



Re: More on elog and error codes

From
"Ross J. Reedstrom"
Date:
On Wed, Mar 21, 2001 at 09:41:44AM +1200, Christopher Sawtell wrote:
> On Tue, 20 Mar 2001 10:56, you wrote:
> 
> Just that it might be a good idea to incorporate the  version / release 
> details in some way so that when somebody on the list is squeaking about 
> an error message it is obvious to the helper that the advice needed is to 
> upgrade from the Cretatious Period version to a modern release, and have 

ROFL - parsed this as Cretinous period on the first pass.

Ross


Re: More on elog and error codes

From
Philip Warner
Date:
At 17:35 20/03/01 +0100, Peter Eisentraut wrote:
>Philip Warner writes:
>
>>     elog(CACHELOOKUPFAIL, cacheItemThatFailed);
>
>The disadvantage of this approach, which I tried to explain in a previous
>message, is that we might want to have different wordings for different
>occurences of the same class of error.
>
>Additionally, the whole idea behind having error *codes* is that the
>client program can easily distinguish errors that it can handle specially.
>Thus the codes should be numeric or some other short, fixed scheme.  In
>the backend they could be replaced by macros.

This seems to be just an argument for constructing the value of
PGERR_CACHELOOKUPFAIL carefully (which is what the VMS message source files
did). The point is that when they are used by a developer, they are simple.



>#define PGERR_TYPE 1854
>
>/* somewhere... */
>
>elogc(ERROR, PGERR_TYPE, "type %s cannot be created because it already
exists", ...)
>
>/* elsewhere... */
>
>elogc(ERROR, PGERR_TYPE, "type %s used as argument %d of function %s
doesn't exist", ...)
>

I can appreciate that there may be cases where the same message is reused,
but that is where parameter substitution comes in. 

In the specific example above, returning the same error code is not going
to help the client. What if they want to handle "type %s used as argument
%d of function %s doesn't exist" by creating the type, and silently ignore
"type %s cannot be created because it already exists"?

How do you handle "type %s can not be used as a function return type"? Is
this PGERR_FUNC or PGERR_TYPE?

If the motivation behind this is to alloy easy translation to SQL error
codes, then I suggest we have an error definition file with explicit
translation:

Code             SQL   Text
PGERR_TYPALREXI  02xxx "type %s cannot be created because it already exists"
PGERR_FUNCNOTYPE 02xxx "type %s used as argument %d of function %s doesn't
exist"

and if we want a generic 'type does not exist', then:

PGERR_NOSUCHTYPE 02xxx "type %s does not exist - %s"

where the %s might contain 'it can't be used as a function argument'.

the we just have

elogc(ERROR, PGERR_TYPALEXI, ...)

/* elsewhere... */

elogc(ERROR, PGERR_FUNCNOTYPE, ...)


Creating central message files/objects has the added advantage of a much
simpler locale support - they're just resource files, and they're NOT
embedded throughout the code.

Finally, if you do want to have some kind of error classification beyond
the SQL code, it could be encoded in the error message file.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: More on elog and error codes

From
Philip Warner
Date:
At 09:41 21/03/01 +1200, Christopher Sawtell wrote:
>Just that it might be a good idea to incorporate the  version / release 
>details in some way so that when somebody on the list is squeaking about 
>an error message it is obvious to the helper that the advice needed is to 
>upgrade from the Cretatious Period version to a modern release, and have 
>another go.

This is better handled by the bug *reporting* system; the users can easily
get the current version number from PG and send it with their reports. We
don't really want all the error codes changing between releases.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: More on elog and error codes

From
Philip Warner
Date:
At 09:43 21/03/01 +1100, Philip Warner wrote:
>
>Code             SQL   Text
>PGERR_TYPALREXI  02xxx "type %s cannot be created because it already exists"
>PGERR_FUNCNOTYPE 02xxx "type %s used as argument %d of function %s doesn't
>exist"
>

Peter,

Just to clarify, because in a previous email you seemed to believe that I
wanted 'PGERR_TYPALREXI' to resolve to a string. I have no such desire; a
meaningful number is fine, but we should never have to type it. One
possibility is that it is the address of an error-info function (built by
'compiling' the message file). Another possibility is that it could be a
prefix to several external symbols, PGERR_TYPALREXI_msg,
PGERR_TYPALREXI_code, PGERR_TYPALREXI_num, PGERR_TYPALREXI_sqlcode etc,
which are again built by compiling the message file. We can then encode
whatever we like into the message, have flexible text, and ease of use for
developers.

Hope this clarifies things...




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: More on elog and error codes

From
Thomas Lockhart
Date:
> Creating central message files/objects has the added advantage of a much
> simpler locale support - they're just resource files, and they're NOT
> embedded throughout the code.
> Finally, if you do want to have some kind of error classification beyond
> the SQL code, it could be encoded in the error message file.

We could also (automatically) build a DBMS reference table *from* this
message file (or files), which would allow lookup of messages from codes
for applications which are not "message-aware".

Not a requirement, and it does not meet all needs (e.g. you would have
to be connected to get the messages in that case) but it would be
helpful for some use cases...
                     - Thomas


Re: More on elog and error codes

From
Philip Warner
Date:
At 03:28 21/03/01 +0000, Thomas Lockhart wrote:
>> Creating central message files/objects has the added advantage of a much
>> simpler locale support - they're just resource files, and they're NOT
>> embedded throughout the code.
>> Finally, if you do want to have some kind of error classification beyond
>> the SQL code, it could be encoded in the error message file.
>
>We could also (automatically) build a DBMS reference table *from* this
>message file (or files), which would allow lookup of messages from codes
>for applications which are not "message-aware".
>
>Not a requirement, and it does not meet all needs (e.g. you would have
>to be connected to get the messages in that case) but it would be
>helpful for some use cases...

If we extended the message definitions to have (optional) description &
user-resolution sections, then we have the possibilty of asking psql to
explain the last error, and (broadly) how to fix it. Of course, in the
first pass, these would all be empty.




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: More on elog and error codes

From
Peter Eisentraut
Date:
Philip Warner writes:

> If the motivation behind this is to alloy easy translation to SQL error
> codes, then I suggest we have an error definition file with explicit
> translation:
>
> Code             SQL   Text
> PGERR_TYPALREXI  02xxx "type %s cannot be created because it already exists"
> PGERR_FUNCNOTYPE 02xxx "type %s used as argument %d of function %s doesn't
> exist"
>
> and if we want a generic 'type does not exist', then:
>
> PGERR_NOSUCHTYPE 02xxx "type %s does not exist - %s"
>
> where the %s might contain 'it can't be used as a function argument'.
>
> the we just have
>
> elogc(ERROR, PGERR_TYPALEXI, ...)
>
> /* elsewhere... */
>
> elogc(ERROR, PGERR_FUNCNOTYPE, ...)

This is going to be a disaster for the coder.  Every time you look at an
elog you don't know what it does? Is the first arg a %s or a %d?  What's
the first %s, what the second?  How can this be checked against bugs?  (I
know GCC can be pretty helpful here, but does it catch all problems?)

Conversely, when you look at the error message you don't know from what
contexts it's called.  The error messages will degrade rapidly in quality
because changing one will become a major project.

> Creating central message files/objects has the added advantage of a much
> simpler locale support - they're just resource files, and they're NOT
> embedded throughout the code.

Actually, the fact that the messages are in the code, where they're used,
and not in a catalog file is a reason why gettext is so popular and
catgets gets laughed at.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: More on elog and error codes

From
Philip Warner
Date:
At 22:03 21/03/01 +0100, Peter Eisentraut wrote:
>Philip Warner writes:
>
>> If the motivation behind this is to alloy easy translation to SQL error
>> codes, then I suggest we have an error definition file with explicit
>> translation:
>>
>> Code             SQL   Text
>> PGERR_TYPALREXI  02xxx "type %s cannot be created because it already
exists"
>> PGERR_FUNCNOTYPE 02xxx "type %s used as argument %d of function %s doesn't
>> exist"
>>
>> and if we want a generic 'type does not exist', then:
>>
>> PGERR_NOSUCHTYPE 02xxx "type %s does not exist - %s"
>>
>> where the %s might contain 'it can't be used as a function argument'.
>>
>> the we just have
>>
>> elogc(ERROR, PGERR_TYPALEXI, ...)
>>
>> /* elsewhere... */
>>
>> elogc(ERROR, PGERR_FUNCNOTYPE, ...)
>
>This is going to be a disaster for the coder.  Every time you look at an
>elog you don't know what it does? Is the first arg a %s or a %d?  What's
>the first %s, what the second?

From experience using this sort of system, probably 80% of errors in new
code are new; if you don't know the format of your own errors, then you
have a larger problem. Secondly, most errors have obvious parameters, and
it only ever gets confusing when they have more than one parameter, and
even then it's pretty obvious. This concern was often raised by people new
to the system, but generally turned out to be more FUD than fact.


>How can this be checked against bugs? 
>Conversely, when you look at the error message you don't know from what
>contexts it's called.

Am I missing something here? The user gets a message like: 
   TYPALREXI: Specified type 'fred' already exists.

then we do 
   glimpse TYPALREXI

It is actually a lot easier than the plain text search we already have to
do, when we have to guess at the words that have been substituted into the
message. Besides, in *both* proposed systems, if we have done things
properly, then the postgres log also contains the module name & line #.


>The error messages will degrade rapidly in quality
>because changing one will become a major project.

Changing one will be a major project only if it is used everywhere. Most
will be relatively localized. And, with glimpse 'XYZ', it's not really that
big a task. Finally, you would need to ask why it was being changed - would
a new message work better? Tell me where the degradation in quality is in
comparison with text-in-the-source versions, with umpteen dozen slightly
different versions of essentially the same error messages?


>> Creating central message files/objects has the added advantage of a much
>> simpler locale support - they're just resource files, and they're NOT
>> embedded throughout the code.
>
>Actually, the fact that the messages are in the code, where they're used,
>and not in a catalog file is a reason why gettext is so popular and
>catgets gets laughed at.

Is there a URL for a getcats vs. gettext debate would help me understand
the reason for the laughter? I can understand laughing at code that looks
like:
   elog(ERROR, 123456, typename);

but
   elog(ERROR, TYPALREXI, typename);

is a whole lot more readable.


Also, you failed to address the two points below:

>#define PGERR_TYPE 1854
>
>/* somewhere... */
>
>elogc(ERROR, PGERR_TYPE, "type %s cannot be created because it already
exists", ...)
>
>/* elsewhere... */
>
>elogc(ERROR, PGERR_TYPE, "type %s used as argument %d of function %s
doesn't exist", ...)
>

In the specific example above, returning the same error code is not going
to help the client. What if they want to handle "type %s used as argument
%d of function %s doesn't exist" by creating the type, and silently ignore
"type %s cannot be created because it already exists"?

How do you handle "type %s can not be used as a function return type"? Is
this PGERR_FUNC or PGERR_TYPE?



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: More on elog and error codes

From
Tom Lane
Date:
I've pretty much got to agree with Peter on both of these points.

Philip Warner <pjw@rhyme.com.au> writes:
> At 22:03 21/03/01 +0100, Peter Eisentraut wrote:
>>>> elogc(ERROR, PGERR_FUNCNOTYPE, ...)
>> 
>> This is going to be a disaster for the coder.  Every time you look at an
>> elog you don't know what it does? Is the first arg a %s or a %d?  What's
>> the first %s, what the second?

>> From experience using this sort of system, probably 80% of errors in new
> code are new; if you don't know the format of your own errors, then you
> have a larger problem. Secondly, most errors have obvious parameters, and
> it only ever gets confusing when they have more than one parameter, and
> even then it's pretty obvious.

The general set of parameters might be pretty obvious, but the exact
type that the format string expects them to be is not so obvious.  We
have enough ints, longs, unsigned longs, etc etc running around the
system that care is required.  If you look at the existing elog calls
you'll find quite a lot of explicit casts to make certain that the right
thing will happen.  If the format strings are not directly visible to
the guy writing an elog call, then errors of that kind will creep in
more easily.

>> The error messages will degrade rapidly in quality
>> because changing one will become a major project.

> Changing one will be a major project only if it is used everywhere.

I agree with Peter on this one too.  Even having to edit a separate
file will create enough friction that people will tend to use an
existing string if it's even marginally appropriate.  What I fear even
more is that people will simply not code error checks, especially for
"can't happen" cases, because it's too much of a pain in the neck to
register the appropriate message.

We must not raise the cost of adding error checks significantly, or we
will lose the marginal checks that sometimes save our bacon by revealing
bugs.
        regards, tom lane


Re: More on elog and error codes

From
Philip Warner
Date:
At 22:03 21/03/01 +0100, Peter Eisentraut wrote:
>
>This is going to be a disaster for the coder.  Every time you look at an
>elog you don't know what it does? Is the first arg a %s or a %d?  What's
>the first %s, what the second?

FWIW, I did a quick scan for elog in PG and found:

- 6856 calls (may include commented-out calls) 
- 2528 unique messages
- 1248 have no parameters
- 859 have exactly one argument
- 285 have exactly 2 args
- 136 have 3 or more args

so 83% have one or no arguments, which is probably not going to be very
confusing.

Looking at the actual messages, there is also a great deal of opportunity
to standardize and simplify since many of the messages only differ by their
prefixed function name.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: More on elog and error codes

From
Philip Warner
Date:
At 23:24 21/03/01 -0500, Tom Lane wrote:
>I've pretty much got to agree with Peter on both of these points.

Damn.


>Philip Warner <pjw@rhyme.com.au> writes:
>> At 22:03 21/03/01 +0100, Peter Eisentraut wrote:
>>>>> elogc(ERROR, PGERR_FUNCNOTYPE, ...)
>>> 
>>> This is going to be a disaster for the coder.  Every time you look at an
>>> elog you don't know what it does? Is the first arg a %s or a %d?  What's
>>> the first %s, what the second?
>
>>> From experience using this sort of system, probably 80% of errors in new
>> code are new; if you don't know the format of your own errors, then you
>> have a larger problem. Secondly, most errors have obvious parameters, and
>> it only ever gets confusing when they have more than one parameter, and
>> even then it's pretty obvious.
>
>The general set of parameters might be pretty obvious, but the exact
>type that the format string expects them to be is not so obvious.  We
>have enough ints, longs, unsigned longs, etc etc running around the
>system that care is required.  If you look at the existing elog calls
>you'll find quite a lot of explicit casts to make certain that the right
>thing will happen.  If the format strings are not directly visible to
>the guy writing an elog call, then errors of that kind will creep in
>more easily.

I agree it's more likely, but most (all?) cases can be caught by the
compiler. It's not ideal, but neither is having eight different versions of
the same message.


>>> The error messages will degrade rapidly in quality
>>> because changing one will become a major project.
>
>> Changing one will be a major project only if it is used everywhere.
>
>I agree with Peter on this one too.  Even having to edit a separate
>file will create enough friction that people will tend to use an
>existing string if it's even marginally appropriate.  What I fear even
>more is that people will simply not code error checks, especially for
>"can't happen" cases, because it's too much of a pain in the neck to
>register the appropriate message.
>
>We must not raise the cost of adding error checks significantly, or we
>will lose the marginal checks that sometimes save our bacon by revealing
>bugs.

This is a problem, I agree - but a procedural one. We need to make
registering messages easy. To do this, rather than having a central message
file, perhaps do the following:

- allow multiple message files (which can be processed to produce .h
files). eg. pg_dump would have it's own pg_dump_messages.xxx file.

- define a message that will assume it's first arg is really a format
string for use in the "can't happen" classes, and which has the SQLCODE for
'internal error'.

We do need some central control, but by creating module-based message files
we can allocate number ranges easily, and we at least take a step down the
path towards a both easy locale handling and a 'big book of error codes'.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: More on elog and error codes

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> This is a problem, I agree - but a procedural one. We need to make
> registering messages easy. To do this, rather than having a central message
> file, perhaps do the following:

> - allow multiple message files (which can be processed to produce .h
> files). eg. pg_dump would have it's own pg_dump_messages.xxx file.

I guess I fail to see why that's better than processing the .c files
to extract the message strings from them.

I agree that the sort of system Peter proposes doesn't have any direct
forcing function to discourage gratuitous variations of what's basically
the same message.  The forcing function would have to come from the
translators, who will look at the extracted list of messages and
complain that there are near-duplicates.  Then we fix the
near-duplicates.  Seems like no big deal.

However, a system that uses multiple message files is also not going to
discourage near-duplicates very effectively.  I don't think you can have
it both ways: if you are discouraging near-duplicates, then you are
making it harder to for people to create new messages, whether
duplicates or not.
        regards, tom lane


Re: More on elog and error codes

From
Philip Warner
Date:
At 00:35 22/03/01 -0500, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> This is a problem, I agree - but a procedural one. We need to make
>> registering messages easy. To do this, rather than having a central message
>> file, perhaps do the following:
>
>> - allow multiple message files (which can be processed to produce .h
>> files). eg. pg_dump would have it's own pg_dump_messages.xxx file.
>
>However, a system that uses multiple message files is also not going to
>discourage near-duplicates very effectively.  I don't think you can have
>it both ways: if you are discouraging near-duplicates, then you are
>making it harder to for people to create new messages, whether
>duplicates or not.

Many of the near duplicates are in the same, or related, code so with local
message files there should be a good chance of reduced duplicates.

Other advantages of a separate definition include:

- Extra fields (eg. description, resolution) which could be used by client
programs.
- Message IDs which can be checked by clients to detect specific errors,
independent of locale.
- SQLCODE set in one place, rather than developers having to code it in
multiple places.

The original proposal also included a 'class' field:
   elogc(ERROR, PGERR_TYPE, "type %s cannot be created because it already 

ISTM that we will have a similar allocation problem with these. But, more
recent example have exluded them, so I am not sure about their status is
Peter's plans.




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/