Thread: sys_siglist[] is causing us trouble again

sys_siglist[] is causing us trouble again

From
Tom Lane
Date:
As of a couple days ago, buildfarm member caiman (Fedora rawhide)
is failing like this in all the pre-v12 branches:

ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-Wno-format-truncation-Wno-stringop-truncation -g -O2 -DFRONTEND -I../../src/include -D_GNU_SOURCE
-I/usr/include/libxml2  -c -o wait_error.o wait_error.c 
wait_error.c: In function \342\200\230wait_result_to_str\342\200\231:
wait_error.c:71:6: error: \342\200\230sys_siglist\342\200\231 undeclared (first use in this function)
   71 |      sys_siglist[WTERMSIG(exitstatus)] : "(unknown)");
      |      ^~~~~~~~~~~
wait_error.c:71:6: note: each undeclared identifier is reported only once for each function it appears in
make[2]: *** [<builtin>: wait_error.o] Error 1

We haven't changed anything, ergo something changed at the OS level.

Oddly, we'd not get to this code unless configure set
HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*.  I suspect the root
issue here is some rearrangement of system header files combined with
wait_error.c (and maybe other places?) not including exactly the same
headers that configure tested.

Anyway, rather than installing rawhide and trying to debug this,
I'd like to make a modest proposal: let's back-patch the v12
patches that made us stop relying on sys_siglist[], viz a73d08319
and cc92cca43.  Per the discussions that led to those patches,
it's been decades since any platform didn't have POSIX-compliant
strsignal(), so we'd be much better off relying on that.

            regards, tom lane



Re: sys_siglist[] is causing us trouble again

From
Filipe Rosset
Date:
On Wed, Jul 15, 2020 at 7:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
As of a couple days ago, buildfarm member caiman (Fedora rawhide)
is failing like this in all the pre-v12 branches:

ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O2 -DFRONTEND -I../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o wait_error.o wait_error.c
wait_error.c: In function \342\200\230wait_result_to_str\342\200\231:
wait_error.c:71:6: error: \342\200\230sys_siglist\342\200\231 undeclared (first use in this function)
   71 |      sys_siglist[WTERMSIG(exitstatus)] : "(unknown)");
      |      ^~~~~~~~~~~
wait_error.c:71:6: note: each undeclared identifier is reported only once for each function it appears in
make[2]: *** [<builtin>: wait_error.o] Error 1

We haven't changed anything, ergo something changed at the OS level.

Oddly, we'd not get to this code unless configure set
HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*.  I suspect the root
issue here is some rearrangement of system header files combined with
wait_error.c (and maybe other places?) not including exactly the same
headers that configure tested.

Anyway, rather than installing rawhide and trying to debug this,
I'd like to make a modest proposal: let's back-patch the v12
patches that made us stop relying on sys_siglist[], viz a73d08319
and cc92cca43.  Per the discussions that led to those patches,
it's been decades since any platform didn't have POSIX-compliant
strsignal(), so we'd be much better off relying on that.

                        regards, tom lane

 I believe it's related with these recent glibc changes at rawhide.
  - signal: Move sys_errlist to a compat symbol
  - signal: Move sys_siglist to a compat symbol
SHA512 (glibc-2.31.9000-683-gffb17e7ba3.tar.xz) = 103ff3c04de5dc149df93e5399de1630f6fff1b8d7f127881d6e530492b8b953a8064205ceecb311a77c0a10de3a5ab2056121fd1fa833a30327c6b1f08beacc            

Re: sys_siglist[] is causing us trouble again

From
Thomas Munro
Date:
On Thu, Jul 16, 2020 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We haven't changed anything, ergo something changed at the OS level.
>
> Oddly, we'd not get to this code unless configure set
> HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*.  I suspect the root
> issue here is some rearrangement of system header files combined with
> wait_error.c (and maybe other places?) not including exactly the same
> headers that configure tested.

It looks like glibc very recently decided[1] to hide the declaration,
but we're using a cached configure test result.  I guess rawhide is
the RH thing that tracks the bleeding edge?

> Anyway, rather than installing rawhide and trying to debug this,
> I'd like to make a modest proposal: let's back-patch the v12
> patches that made us stop relying on sys_siglist[], viz a73d08319
> and cc92cca43.  Per the discussions that led to those patches,
> it's been decades since any platform didn't have POSIX-compliant
> strsignal(), so we'd be much better off relying on that.

Seems sensible.  Despite the claims of the glibc manual[2], it's not
really a GNU extension, and the BSDs have it (for decades).

[1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=b1ccfc061feee9ce616444ded8e1cd5acf9fa97f
[2] https://www.gnu.org/software/libc/manual/html_node/Signal-Messages.html



Re: sys_siglist[] is causing us trouble again

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Thu, Jul 16, 2020 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Oddly, we'd not get to this code unless configure set
>> HAVE_DECL_SYS_SIGLIST, so it's defined *somewhere*.

> It looks like glibc very recently decided[1] to hide the declaration,
> but we're using a cached configure test result.

Ah, of course.  I was thinking that Peter had just changed configure
in the last day or so, but that did not affect the back branches.
So it's unsurprising for buildfarm animals to be using cached configure
results.

> I guess rawhide is the RH thing that tracks the bleeding edge?

Yup.  Possibly we should recommend that buildfarm owners running on
non-stable platforms disable autoconf result caching --- I believe
that's "use_accache => undef" in the configuration file.

Alternatively, maybe it'd be bright for the buildfarm script to
discard that cache after any failure (or at least configure or
build failures).

            regards, tom lane



Re: sys_siglist[] is causing us trouble again

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Thu, Jul 16, 2020 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We haven't changed anything, ergo something changed at the OS level.

> It looks like glibc very recently decided[1] to hide the declaration,
> but we're using a cached configure test result.

Right.  So, modulo the mis-cached result, what would happen if we do
nothing is that the back branches would lose the ability to translate
signal numbers to strings on bleeding-edge glibc.  I don't think we
want that, so we need to back-patch.  Attached is a lightly tested
patch for v11.  (This includes 7570df0f3 as well, so that
pgstrsignal.c will be the same in all branches.)

            regards, tom lane

diff --git a/configure b/configure
index 4e1b4be7fb..7382a34d60 100755
--- a/configure
+++ b/configure
@@ -15004,7 +15004,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

-for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll
posix_fallocatepstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range uselocale
utimeutimes wcstombs_l 
+for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll
posix_fallocatepstat pthread_is_threaded_np readlink setproctitle setsid shm_open strsignal symlink sync_file_range
uselocaleutime utimes wcstombs_l 
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
@@ -15893,24 +15893,6 @@ esac

 fi

-ac_fn_c_check_decl "$LINENO" "sys_siglist" "ac_cv_have_decl_sys_siglist" "#include <signal.h>
-/* NetBSD declares sys_siglist in unistd.h.  */
-#ifdef HAVE_UNISTD_H
-# include <unistd.h>
-#endif
-
-"
-if test "x$ac_cv_have_decl_sys_siglist" = xyes; then :
-  ac_have_decl=1
-else
-  ac_have_decl=0
-fi
-
-cat >>confdefs.h <<_ACEOF
-#define HAVE_DECL_SYS_SIGLIST $ac_have_decl
-_ACEOF
-
-
 ac_fn_c_check_func "$LINENO" "syslog" "ac_cv_func_syslog"
 if test "x$ac_cv_func_syslog" = xyes; then :
   ac_fn_c_check_header_mongrel "$LINENO" "syslog.h" "ac_cv_header_syslog_h" "$ac_includes_default"
diff --git a/configure.in b/configure.in
index fab1658bca..defcb8ff99 100644
--- a/configure.in
+++ b/configure.in
@@ -1622,6 +1622,7 @@ AC_CHECK_FUNCS(m4_normalize([
     setproctitle
     setsid
     shm_open
+    strsignal
     symlink
     sync_file_range
     uselocale
@@ -1821,14 +1822,6 @@ if test "$PORTNAME" = "cygwin"; then
   AC_LIBOBJ(dirmod)
 fi

-AC_CHECK_DECLS([sys_siglist], [], [],
-[#include <signal.h>
-/* NetBSD declares sys_siglist in unistd.h.  */
-#ifdef HAVE_UNISTD_H
-# include <unistd.h>
-#endif
-])
-
 AC_CHECK_FUNC(syslog,
               [AC_CHECK_HEADER(syslog.h,
                                [AC_DEFINE(HAVE_SYSLOG, 1, [Define to 1 if you have the syslog interface.])])])
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 252220c770..08577f5e5f 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -596,17 +596,10 @@ pgarch_archiveXlog(char *xlog)
                      errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
                      errdetail("The failed archive command was: %s",
                                xlogarchcmd)));
-#elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST
-            ereport(lev,
-                    (errmsg("archive command was terminated by signal %d: %s",
-                            WTERMSIG(rc),
-                            WTERMSIG(rc) < NSIG ? sys_siglist[WTERMSIG(rc)] : "(unknown)"),
-                     errdetail("The failed archive command was: %s",
-                               xlogarchcmd)));
 #else
             ereport(lev,
-                    (errmsg("archive command was terminated by signal %d",
-                            WTERMSIG(rc)),
+                    (errmsg("archive command was terminated by signal %d: %s",
+                            WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
                      errdetail("The failed archive command was: %s",
                                xlogarchcmd)));
 #endif
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3bfc299be1..75a9e07041 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3563,6 +3563,7 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
                         procname, pid, WEXITSTATUS(exitstatus)),
                  activity ? errdetail("Failed process was running: %s", activity) : 0));
     else if (WIFSIGNALED(exitstatus))
+    {
 #if defined(WIN32)
         ereport(lev,

@@ -3573,7 +3574,7 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
                         procname, pid, WTERMSIG(exitstatus)),
                  errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
                  activity ? errdetail("Failed process was running: %s", activity) : 0));
-#elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST
+#else
         ereport(lev,

         /*------
@@ -3581,19 +3582,10 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
           "server process" */
                 (errmsg("%s (PID %d) was terminated by signal %d: %s",
                         procname, pid, WTERMSIG(exitstatus),
-                        WTERMSIG(exitstatus) < NSIG ?
-                        sys_siglist[WTERMSIG(exitstatus)] : "(unknown)"),
-                 activity ? errdetail("Failed process was running: %s", activity) : 0));
-#else
-        ereport(lev,
-
-        /*------
-          translator: %s is a noun phrase describing a child process, such as
-          "server process" */
-                (errmsg("%s (PID %d) was terminated by signal %d",
-                        procname, pid, WTERMSIG(exitstatus)),
+                        pg_strsignal(WTERMSIG(exitstatus))),
                  activity ? errdetail("Failed process was running: %s", activity) : 0));
 #endif
+    }
     else
         ereport(lev,

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 305dba20cb..662ba2bc3e 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -2014,7 +2014,7 @@ BaseBackup(void)
     {
 #ifndef WIN32
         int            status;
-        int            r;
+        pid_t        r;
 #else
         DWORD        status;

@@ -2042,7 +2042,7 @@ BaseBackup(void)

         /* Just wait for the background process to exit */
         r = waitpid(bgchild, &status, 0);
-        if (r == -1)
+        if (r == (pid_t) -1)
         {
             fprintf(stderr, _("%s: could not wait for child process: %s\n"),
                     progname, strerror(errno));
@@ -2051,19 +2051,13 @@ BaseBackup(void)
         if (r != bgchild)
         {
             fprintf(stderr, _("%s: child %d died, expected %d\n"),
-                    progname, r, (int) bgchild);
+                    progname, (int) r, (int) bgchild);
             disconnect_and_exit(1);
         }
-        if (!WIFEXITED(status))
-        {
-            fprintf(stderr, _("%s: child process did not exit normally\n"),
-                    progname);
-            disconnect_and_exit(1);
-        }
-        if (WEXITSTATUS(status) != 0)
+        if (status != 0)
         {
-            fprintf(stderr, _("%s: child process exited with error %d\n"),
-                    progname, WEXITSTATUS(status));
+            fprintf(stderr, "%s: %s\n",
+                    progname, wait_result_to_str(status));
             disconnect_and_exit(1);
         }
         /* Exited normally, we're happy! */
diff --git a/src/common/wait_error.c b/src/common/wait_error.c
index 27f5284998..cef36ec01a 100644
--- a/src/common/wait_error.c
+++ b/src/common/wait_error.c
@@ -56,25 +56,17 @@ wait_result_to_str(int exitstatus)
         }
     }
     else if (WIFSIGNALED(exitstatus))
+    {
 #if defined(WIN32)
         snprintf(str, sizeof(str),
                  _("child process was terminated by exception 0x%X"),
                  WTERMSIG(exitstatus));
-#elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST
-    {
-        char        str2[256];
-
-        snprintf(str2, sizeof(str2), "%d: %s", WTERMSIG(exitstatus),
-                 WTERMSIG(exitstatus) < NSIG ?
-                 sys_siglist[WTERMSIG(exitstatus)] : "(unknown)");
-        snprintf(str, sizeof(str),
-                 _("child process was terminated by signal %s"), str2);
-    }
 #else
         snprintf(str, sizeof(str),
-                 _("child process was terminated by signal %d"),
-                 WTERMSIG(exitstatus));
+                 _("child process was terminated by signal %d: %s"),
+                 WTERMSIG(exitstatus), pg_strsignal(WTERMSIG(exitstatus)));
 #endif
+    }
     else
         snprintf(str, sizeof(str),
                  _("child process exited with unrecognized status %d"),
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index a19f6981d7..a33acbf2d8 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -182,10 +182,6 @@
    don't. */
 #undef HAVE_DECL_STRTOULL

-/* Define to 1 if you have the declaration of `sys_siglist', and to 0 if you
-   don't. */
-#undef HAVE_DECL_SYS_SIGLIST
-
 /* Define to 1 if you have the declaration of `vsnprintf', and to 0 if you
    don't. */
 #undef HAVE_DECL_VSNPRINTF
@@ -547,6 +543,9 @@
 /* Define to use have a strong random number source */
 #undef HAVE_STRONG_RANDOM

+/* Define to 1 if you have the `strsignal' function. */
+#undef HAVE_STRSIGNAL
+
 /* Define to 1 if you have the `strtoll' function. */
 #undef HAVE_STRTOLL

diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 9bc4b3fc0f..e48ded8973 100644
--- a/src/include/pg_config.h.win32
+++ b/src/include/pg_config.h.win32
@@ -414,6 +414,9 @@
 /* Define to use have a strong random number source */
 #define HAVE_STRONG_RANDOM 1

+/* Define to 1 if you have the `strsignal' function. */
+/* #undef HAVE_STRSIGNAL */
+
 /* Define to 1 if you have the `strtoll' function. */
 #ifdef HAVE_LONG_LONG_INT_64
 #define HAVE_STRTOLL 1
diff --git a/src/include/port.h b/src/include/port.h
index a9d557b55c..c97d9be250 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -189,6 +189,9 @@ extern int    pg_printf(const char *fmt,...) pg_attribute_printf(1, 2);
 #endif
 #endif                            /* USE_REPL_SNPRINTF */

+/* Wrap strsignal(), or provide our own version if necessary */
+extern const char *pg_strsignal(int signum);
+
 /* Portable prompt handling */
 extern void simple_prompt(const char *prompt, char *destination, size_t destlen,
               bool echo);
diff --git a/src/port/Makefile b/src/port/Makefile
index d7467fb078..ae37898bee 100644
--- a/src/port/Makefile
+++ b/src/port/Makefile
@@ -32,7 +32,7 @@ LIBS += $(PTHREAD_LIBS)

 OBJS = $(LIBOBJS) $(PG_CRC32C_OBJS) chklocale.o erand48.o inet_net_ntop.o \
     noblock.o path.o pgcheckdir.o pgmkdirp.o pgsleep.o \
-    pgstrcasecmp.o pqsignal.o \
+    pgstrcasecmp.o pgstrsignal.o pqsignal.o \
     qsort.o qsort_arg.o quotes.o sprompt.o tar.o thread.o

 ifeq ($(enable_strong_random), yes)
diff --git a/src/port/pgstrsignal.c b/src/port/pgstrsignal.c
new file mode 100644
index 0000000000..7724edf56e
--- /dev/null
+++ b/src/port/pgstrsignal.c
@@ -0,0 +1,64 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgstrsignal.c
+ *      Identify a Unix signal number
+ *
+ * On platforms compliant with modern POSIX, this just wraps strsignal(3).
+ * Elsewhere, we do the best we can.
+ *
+ * This file is not currently built in MSVC builds, since it's useless
+ * on non-Unix platforms.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *      src/port/pgstrsignal.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "c.h"
+
+
+/*
+ * pg_strsignal
+ *
+ * Return a string identifying the given Unix signal number.
+ *
+ * The result is declared "const char *" because callers should not
+ * modify the string.  Note, however, that POSIX does not promise that
+ * the string will remain valid across later calls to strsignal().
+ *
+ * This version guarantees to return a non-NULL pointer, although
+ * some platforms' versions of strsignal() reputedly do not.
+ *
+ * Note that the fallback cases just return constant strings such as
+ * "unrecognized signal".  Project style is for callers to print the
+ * numeric signal value along with the result of this function, so
+ * there's no need to work harder than that.
+ */
+const char *
+pg_strsignal(int signum)
+{
+    const char *result;
+
+    /*
+     * If we have strsignal(3), use that --- but check its result for NULL.
+     */
+#ifdef HAVE_STRSIGNAL
+    result = strsignal(signum);
+    if (result == NULL)
+        result = "unrecognized signal";
+#else
+
+    /*
+     * We used to have code here to try to use sys_siglist[] if available.
+     * However, it seems that all platforms with sys_siglist[] have also had
+     * strsignal() for many years now, so that was just a waste of code.
+     */
+    result = "(signal names not available on this platform)";
+#endif
+
+    return result;
+}
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 9c6d2efb56..2c469413a3 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -1560,14 +1560,9 @@ log_child_failure(int exitstatus)
 #if defined(WIN32)
         status(_(" (test process was terminated by exception 0x%X)"),
                WTERMSIG(exitstatus));
-#elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST
-        status(_(" (test process was terminated by signal %d: %s)"),
-               WTERMSIG(exitstatus),
-               WTERMSIG(exitstatus) < NSIG ?
-               sys_siglist[WTERMSIG(exitstatus)] : "(unknown))");
 #else
-        status(_(" (test process was terminated by signal %d)"),
-               WTERMSIG(exitstatus));
+        status(_(" (test process was terminated by signal %d: %s)"),
+               WTERMSIG(exitstatus), pg_strsignal(WTERMSIG(exitstatus)));
 #endif
     }
     else

Re: sys_siglist[] is causing us trouble again

From
Andrew Dunstan
Date:
On 7/15/20 7:36 PM, Tom Lane wrote:
> I guess rawhide is the RH thing that tracks the bleeding edge?
> Yup.  Possibly we should recommend that buildfarm owners running on
> non-stable platforms disable autoconf result caching --- I believe
> that's "use_accache => undef" in the configuration file.
>
> Alternatively, maybe it'd be bright for the buildfarm script to
> discard that cache after any failure (or at least configure or
> build failures).



Yeah, these lines will be added to the upcoming client code release in
run_build.pl Search for 'obsolete' and you'll find where to put it if
you want to be ahead of the curve.


my $last_stage = get_last_stage() || "";
$obsolete ||=
    $last_stage =~ /^(Make|Configure|Contrib|.*-build)$/;


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services