Thread: Improved fix for sys_nerr

Improved fix for sys_nerr

From
Pete Forman
Date:
============================================================================
                        POSTGRESQL BUG REPORT TEMPLATE
============================================================================


Your name        :    Pete Forman
Your email address    :       pete.forman@westgeo.com


System Configuration
---------------------
  Architecture (example: Intel Pentium)      : Intel Pentium II

  Operating System (example: Linux 2.0.26 ELF)     : Cygwin B20.1 and others

  PostgreSQL version (example: PostgreSQL-7.1):   PostgreSQL-7.1

  Compiler used (example:  gcc 2.8.0)        : egcs-2.91.57


Please enter a FULL description of your problem:
------------------------------------------------

Files backend/utils/error/elog.c and exc.c had errors building on
Cygwin B20.1 (and BeOS so I'm told).  The current source (elog.c,v
1.64, exc.c,v 1.31) is fixed for Cygwin 1.1 and BeOS but not very
robustly.  This patch works on the systems and versions I've tested.
Others were tested by subscribers to PORTS.

The changes made are:

   1) Remove declarations of "extern int errno;".
      - errno does not have to be an int, it may be a macro.  In
        threaded code it almost certainly will be the latter.
        <errno.h> will already have the correct declaration.

   2) Remove sys_nerr and _sys_nerr.
      - These are not portable, e.g. older Cygwin and BeOS.
        In any case the test errno < sys_nerr assumes that valid
        errnos are contiguous.  strerror() may go wrong on holes.

   3) Test both errno and the return of strerror() to see if it fails.
      - This works an all(?) OSs except HPUX and QNX.  These then have
        a minor limitation that if the elog() is called with an out of
        range errno then its number will not be printed.  I would
        expect that it would be called with a valid errno.
        In exc.c both the errno and its message, if available, are now
        printed.
        I leave it to the user to call elog(..., "%d: %m", errno, ...)
        if they want to guarantee a number.
        Anything more complicated will need to involve configure.  I
        am loathe to do so because this sort of error code is called
        infrequently and will only be tested lightly.  Keep it
        simple.

    4) Remove any OS specific clauses: ifdef __CYGWIN__ or __BEOS__.



I have posted an equivalent patch to PORTS for 7.0.2.


Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------





If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------

*** src/backend/utils/error/elog.c.orig    Sun Oct  8 08:00:18 2000
--- src/backend/utils/error/elog.c    Wed Oct 25 16:55:50 2000
***************
*** 37,49 ****
  #include "tcop/tcopprot.h"
  #include "commands/copy.h"

- extern int    errno;
-
- #ifdef __CYGWIN__
- # define sys_nerr _sys_nerr
- #endif
- extern int    sys_nerr;
-
  extern CommandDest whereToSendOutput;

  #ifdef ENABLE_SYSLOG
--- 37,42 ----
***************
*** 130,152 ****
      int            len;
      /* size of the prefix needed for timestamp and pid, if enabled */
      size_t      timestamp_size;

      if (lev <= DEBUG && Debugfile < 0)
          return;                    /* ignore debug msgs if noplace to send */

! /* BeOS doesn't have sys_nerr and should be able to use strerror()... */
! #ifndef __BEOS__
!     /* save errno string for %m */
!     if (errno < sys_nerr && errno >= 0)
!         errorstr = strerror(errno);
!     else
      {
!         sprintf(errorstr_buf, "error %d", errno);
          errorstr = errorstr_buf;
      }
- #else
-     errorstr = strerror(errno);
- #endif /* __BEOS__ */

      if (lev == ERROR || lev == FATAL)
      {
--- 123,151 ----
      int            len;
      /* size of the prefix needed for timestamp and pid, if enabled */
      size_t      timestamp_size;
+     int            errno_copy = errno;

      if (lev <= DEBUG && Debugfile < 0)
          return;                    /* ignore debug msgs if noplace to send */

!     /*
!      * save errno string for %m
!      * Standard UNIX (XPG4v2/UNIX95 and later) says that errno will be set
!      * (to EINVAL) if the argument to strerror() is out of range.
!      * IRIX and Solaris actually return NULL without setting errno.
!      * Others such as AIX, Cygwin and Linux return a string for all values.
!      *   This string contains a number for out of range values.
!      * HPUX and QNX return the same string for all out of range values.
!      *   Those will not be well served by this code.  However it is highly
!      *   unlikely that this code will be called with an out of range errno.
!      */
!     errno = 0;
!     errorstr = strerror(errno_copy);
!     if (errno != 0 || errorstr == NULL)
      {
!         sprintf(errorstr_buf, "error %d", errno_copy);
          errorstr = errorstr_buf;
      }

      if (lev == ERROR || lev == FATAL)
      {
*** src/backend/utils/error/exc.c.orig    Tue Oct  3 08:00:18 2000
--- src/backend/utils/error/exc.c    Wed Oct 25 16:37:39 2000
***************
*** 94,106 ****
      ExceptionHandlingEnabled = on;
  }

-
- extern int    errno;
- #ifdef __CYGWIN__
- # define sys_nerr _sys_nerr
- #endif
- extern int    sys_nerr;
-
  static void
  ExcPrint(Exception *excP,
           ExcDetail detail,
--- 94,99 ----
***************
*** 107,112 ****
--- 100,108 ----
           ExcData data,
           ExcMessage message)
  {
+     int errno_copy = errno;
+     const char *errorstr;
+
  #ifdef    lint
      data = data;
  #endif
***************
*** 131,144 ****

      fprintf(stderr, " (%ld)", detail);

! #ifndef __BEOS__
!     if (errno > 0 && errno < sys_nerr)
! #else
!     if (errno > 0)
! #endif
!         fprintf(stderr, " [%s]", strerror(errno));
!     else if (errno != 0)
!         fprintf(stderr, " [Error %d]", errno);

      fprintf(stderr, "\n");

--- 127,140 ----

      fprintf(stderr, " (%ld)", detail);

!     if (errno_copy != 0) {
!         errno = 0;
!         errorstr = strerror(errno_copy);
!         if (errno == 0 && errorstr != NULL)
!             fprintf(stderr, " [Error %d: %s]", errno_copy, errorstr);
!         else if (errno_copy != 0)
!             fprintf(stderr, " [Error %d]", errno_copy);
!     }

      fprintf(stderr, "\n");



--
Pete Forman                 -./\.- Disclaimer: This post is originated
Western Geophysical           -./\.-  by myself and does not represent
pete.forman@westgeo.com         -./\.-  the opinion of Baker Hughes or
http://www.crosswinds.net/~petef  -./\.-  its divisions.

Re: Improved fix for sys_nerr

From
Tom Lane
Date:
Pete Forman <gsez020@kryten.bedford.waii.com> writes:
>    2) Remove sys_nerr and _sys_nerr.

The reason that there was a test against sys_nerr in the first place
is that I believe some systems define strerror(n) as a simple macro
that expands into sys_errlist[n] --- compare for example
src/interfaces/libpq/libpq-int.h's definition of strerror().
(I think SunOS was like this, but no longer have access to a SunOS
machine to check.)  On such a system, elog() is quite likely to dump
core when presented an out-of-range errno value, if it hasn't got a
range comparison against sys_nerr to defend itself.

For this reason, I object to stripping the test against sys_nerr
completely, especially since it only fails on arguably nonstandard
versions of Unix.  (Cygwin and BEos versus everyone else ... and
please do not quote UNIX95 at me --- that's a wannabee standard,
not a standard.  Your own results show how few platforms follow it.)

I recommend adding a configure test to see whether sys_nerr exists,
and trusting strerror() to be bulletproof only where it does not.

As you say, there could be holes in the error number assignment,
so an additional test for NULL result from strerror() seems wise.

            regards, tom lane

Re: Improved fix for sys_nerr

From
Peter Eisentraut
Date:
Tom Lane writes:

> I recommend adding a configure test to see whether sys_nerr exists,
> and trusting strerror() to be bulletproof only where it does not.
>
> As you say, there could be holes in the error number assignment,
> so an additional test for NULL result from strerror() seems wise.

Definitely a good plan.  If someone wants to be really dilligent he could
write up a "pgstrerror", which

1. does the str_nerr check if it exists
2. runs strerror(errno)
3. checks for NULL or EINVAL
4. returns something useful

You put that into src/utils and link it into every program or library that
uses strerror.

Any reasonable subset of this solution would be fine as well. :)

--
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/


Re: Improved fix for sys_nerr

From
Pete Forman
Date:
Tom Lane writes:
 > Pete Forman <gsez020@kryten.bedford.waii.com> writes:
 > >    2) Remove sys_nerr and _sys_nerr.
 >
 > The reason that there was a test against sys_nerr in the first
 > place is that I believe some systems define strerror(n) as a simple
 > macro that expands into sys_errlist[n] --- compare for example
 > src/interfaces/libpq/libpq-int.h's definition of strerror().  (I
 > think SunOS was like this, but no longer have access to a SunOS
 > machine to check.)  On such a system, elog() is quite likely to
 > dump core when presented an out-of-range errno value, if it hasn't
 > got a range comparison against sys_nerr to defend itself.

The SunOS 5.5 is the oldest that I've access to.  That does behave as
per Sun's docs: strerror returns NULL if errnum is out-of-range.

But I do agree that strerror(2147483647) or strerror(-2147483647)
might core dump on some systems.  Solaris 2.5 and later seem okay.

That macro seems weak.  It should be something like this, or a
function.

#define strerror(A) \
    (((A) < 0 || (A) >= sys_nerr) ? "unknown error" : sys_errlist[(A)])

Are there any systems that have sys_errlist but not sys_nerr?  My
SunOS 5 documentation mentions them together in references to previous
versions.

 > For this reason, I object to stripping the test against sys_nerr
 > completely, especially since it only fails on arguably nonstandard
 > versions of Unix.  (Cygwin and BEos versus everyone else ... and
 > please do not quote UNIX95 at me --- that's a wannabee standard,
 > not a standard.  Your own results show how few platforms follow
 > it.)

So what standards can be used?  I try to use ANSI/ISO C first, then
POSIX, then Open Group.  The Open Group, formerly X/Open, have been
merging BSD and SysV into XPGn.  UNIX95 is XPG4v2.  These standards
are independent of any particular implementation and so it is unlikely
that any implementation will be 100% compliant.  I would be grateful
if you can point me to BSD or SysV specifications rather than examples
of implementations.  The Solaris standards man page, for example, only
cites POSIX and X/Open.

In the particular case of strerror, ISO C90 says that the content of
the string whose pointer is returned by strerror is implementation
defined.  C99 strengthens the spec to say that strerror shall map any
value of type int to a message.

Having said all that, standards only guide you as to how to write
portable code.  Try to use them to work on the greatest number of
current and future systems.  Use autoconf to identify exceptions.

 > I recommend adding a configure test to see whether sys_nerr exists,
 > and trusting strerror() to be bulletproof only where it does not.

I agree with that.  My main gripe is that OS specific ifdefs are too
blunt an instrument.  For example the ifdef for Cygwin does not grok
that B20.1 is missing the feature but that 1.1 has it.  Even adding
some version test to that would break on modified systems.  Autoconf
is the way to go.

 > As you say, there could be holes in the error number assignment,
 > so an additional test for NULL result from strerror() seems wise.

Agreed.
--
Pete Forman                 -./\.- Disclaimer: This post is originated
Western Geophysical           -./\.-  by myself and does not represent
pete.forman@westgeo.com         -./\.-  the opinion of Baker Hughes or
http://www.crosswinds.net/~petef  -./\.-  its divisions.