Thread: Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

From
Andrew Dunstan
Date:
Stefan Kaltenbrunner wrote:
> Tom Lane wrote:
>
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>
>>> I'm actually wondering if unlimiting core might not be a useful switch
>>> to provide on pg_ctl, as long as the platform has setrlimit().
>>>
>> Not a bad thought; that's actually one of the reasons that I still
>> usually use a handmade script rather than pg_ctl for launching
>> postmasters ...
>>
>
> this sounds like a good idea for me too - it seems like a cleaner and
> more useful thing on a general base then just doing it in the buildfarm
> code ...
>
>
>


Draft patch attached. However, there will be some more work to do. For
one thing, pg_regress does not use pg_ctl to start its temp install
postmaster, so either we'll need to train it the same way or get it to
use pg_ctl. And then we'd need to change the regression makefile to use
the option, based on an environment variable a bit like MAX_CONNEXCTIONS
maybe.

cheers

andrew

Index: src/bin/pg_ctl/pg_ctl.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/pg_ctl/pg_ctl.c,v
retrieving revision 1.74
diff -c -r1.74 pg_ctl.c
*** src/bin/pg_ctl/pg_ctl.c    12 Oct 2006 05:14:49 -0000    1.74
--- src/bin/pg_ctl/pg_ctl.c    29 Dec 2006 21:08:39 -0000
***************
*** 26,31 ****
--- 26,36 ----
  #include <sys/stat.h>
  #include <unistd.h>

+ #ifdef HAVE_SYS_RESOURCE_H
+ #include <sys/time.h>
+ #include <sys/resource.h>
+ #endif
+
  #include "libpq/pqsignal.h"
  #include "getopt_long.h"

***************
*** 90,95 ****
--- 95,103 ----
  static char *register_username = NULL;
  static char *register_password = NULL;
  static char *argv0 = NULL;
+ #if HAVE_GETRLIMIT
+ static bool allow_core_files = false;
+ #endif

  static void
  write_stderr(const char *fmt,...)
***************
*** 132,137 ****
--- 140,149 ----
  static char pid_file[MAXPGPATH];
  static char conf_file[MAXPGPATH];

+ #if HAVE_GETRLIMIT
+ static void unlimit_core_size(void);
+ #endif
+

  #if defined(WIN32) || defined(__CYGWIN__)
  static void
***************
*** 478,483 ****
--- 490,516 ----
  }


+ #if HAVE_GETRLIMIT
+ static void
+ unlimit_core_size(void)
+ {
+     struct rlimit lim;
+     getrlimit(RLIMIT_CORE,&lim);
+     if (lim.rlim_max == 0)
+     {
+             write_stderr(_("%s: cannot set core size,: disallowed by hard limit.\n"),
+                          progname);
+             return;
+     }
+     else if (lim.rlim_max == RLIM_INFINITY || lim.rlim_cur < lim.rlim_max)
+     {
+         lim.rlim_cur = lim.rlim_max;
+         setrlimit(RLIMIT_CORE,&lim);
+     }
+ }
+ #endif
+
+

  static void
  do_start(void)
***************
*** 581,586 ****
--- 614,624 ----
          postgres_path = postmaster_path;
      }

+ #if HAVE_GETRLIMIT
+     if (allow_core_files)
+         unlimit_core_size();
+ #endif
+
      exitcode = start_postmaster();
      if (exitcode != 0)
      {
***************
*** 1401,1406 ****
--- 1439,1447 ----
      printf(_("  -o OPTIONS             command line options to pass to postgres\n"
               "                         (PostgreSQL server executable)\n"));
      printf(_("  -p PATH-TO-POSTGRES    normally not necessary\n"));
+ #if HAVE_GETRLIMIT
+     printf(_("  -c, --corefiles        allow postgres to produce core files\n"));
+ #endif

      printf(_("\nOptions for stop or restart:\n"));
      printf(_("  -m SHUTDOWN-MODE   may be \"smart\", \"fast\", or \"immediate\"\n"));
***************
*** 1497,1502 ****
--- 1538,1546 ----
          {"mode", required_argument, NULL, 'm'},
          {"pgdata", required_argument, NULL, 'D'},
          {"silent", no_argument, NULL, 's'},
+ #if HAVE_GETRLIMIT
+         {"corefiles", no_argument, NULL, 'c'},
+ #endif
          {NULL, 0, NULL, 0}
      };

***************
*** 1561,1567 ****
      /* process command-line options */
      while (optind < argc)
      {
!         while ((c = getopt_long(argc, argv, "D:l:m:N:o:p:P:sU:wW", long_options, &option_index)) != -1)
          {
              switch (c)
              {
--- 1605,1611 ----
      /* process command-line options */
      while (optind < argc)
      {
!         while ((c = getopt_long(argc, argv, "cD:l:m:N:o:p:P:sU:wW", long_options, &option_index)) != -1)
          {
              switch (c)
              {
***************
*** 1632,1637 ****
--- 1676,1686 ----
                      do_wait = false;
                      wait_set = true;
                      break;
+ #if HAVE_GETRLIMIT
+                 case 'c':
+                     allow_core_files = true;
+                     break;
+ #endif
                  default:
                      /* getopt_long already issued a suitable error message */
                      do_advice();

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> ... And then we'd need to change the regression makefile to use
> the option, based on an environment variable a bit like MAX_CONNEXCTIONS
> maybe.

Why wouldn't we just use it always?  If a regression test dumps core,
that's going to deserve investigation.

            regards, tom lane

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> ... And then we'd need to change the regression makefile to use
>> the option, based on an environment variable a bit like MAX_CONNEXCTIONS
>> maybe.
>
> Why wouldn't we just use it always?  If a regression test dumps core,
> that's going to deserve investigation.

enabling it always for the regression tests probably makes sense - but
there is also the possibility that such a core can get very large and
potentially run the partitition the regression test runs on out of space.


Stefan

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

From
"Andrew Dunstan"
Date:
Stefan Kaltenbrunner wrote:
> Tom Lane wrote:
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>> ... And then we'd need to change the regression makefile to use
>>> the option, based on an environment variable a bit like
>>> MAX_CONNEXCTIONS
>>> maybe.
>>
>> Why wouldn't we just use it always?  If a regression test dumps core,
>> that's going to deserve investigation.
>
> enabling it always for the regression tests probably makes sense - but
> there is also the possibility that such a core can get very large and
> potentially run the partitition the regression test runs on out of space.
>
>

I think Tom is right. You can always set the hard limit before calling
"make check" or running the buildfarm script. I'll prepare a patch to use
similar code unconditionally in pg_regress.

cheers

andrew



Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

From
Andrew Dunstan
Date:
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>
>> ... And then we'd need to change the regression makefile to use
>> the option, based on an environment variable a bit like MAX_CONNEXCTIONS
>> maybe.
>>
>
> Why wouldn't we just use it always?  If a regression test dumps core,
> that's going to deserve investigation.
>
>
>

Revised patch attached, doing just this. I will apply it soon unless
there are objections.

cheers

andrew

Index: doc/src/sgml/ref/pg_ctl-ref.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/pg_ctl-ref.sgml,v
retrieving revision 1.35
diff -c -r1.35 pg_ctl-ref.sgml
*** doc/src/sgml/ref/pg_ctl-ref.sgml    2 Dec 2006 00:34:52 -0000    1.35
--- doc/src/sgml/ref/pg_ctl-ref.sgml    2 Jan 2007 20:25:01 -0000
***************
*** 29,34 ****
--- 29,35 ----
     <arg>-l <replaceable>filename</replaceable></arg>
     <arg>-o <replaceable>options</replaceable></arg>
     <arg>-p <replaceable>path</replaceable></arg>
+    <arg>-c</arg>
     <sbr>
     <command>pg_ctl</command>
     <arg choice="plain">stop</arg>
***************
*** 48,53 ****
--- 49,55 ----
     <arg>-w</arg>
     <arg>-s</arg>
     <arg>-D <replaceable>datadir</replaceable></arg>
+    <arg>-c</arg>
     <arg>-m
       <group choice="plain">
         <arg>s[mart]</arg>
***************
*** 246,251 ****
--- 248,266 ----
       </varlistentry>

       <varlistentry>
+       <term><option>-c</option></term>
+       <listitem>
+        <para>
+         Attempt to allow server crashes to produce core files, on platforms
+         where this available, by lifting any soft resource limit placed on
+         them.
+         This is useful in debugging or diagnosing problems by allowing a
+         stack trace to be obtained from a failed server process.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
        <term><option>-w</option></term>
        <listitem>
         <para>
Index: src/bin/pg_ctl/pg_ctl.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/pg_ctl/pg_ctl.c,v
retrieving revision 1.74
diff -c -r1.74 pg_ctl.c
*** src/bin/pg_ctl/pg_ctl.c    12 Oct 2006 05:14:49 -0000    1.74
--- src/bin/pg_ctl/pg_ctl.c    2 Jan 2007 20:25:02 -0000
***************
*** 26,31 ****
--- 26,36 ----
  #include <sys/stat.h>
  #include <unistd.h>

+ #ifdef HAVE_SYS_RESOURCE_H
+ #include <sys/time.h>
+ #include <sys/resource.h>
+ #endif
+
  #include "libpq/pqsignal.h"
  #include "getopt_long.h"

***************
*** 90,95 ****
--- 95,103 ----
  static char *register_username = NULL;
  static char *register_password = NULL;
  static char *argv0 = NULL;
+ #if HAVE_GETRLIMIT
+ static bool allow_core_files = false;
+ #endif

  static void
  write_stderr(const char *fmt,...)
***************
*** 132,137 ****
--- 140,149 ----
  static char pid_file[MAXPGPATH];
  static char conf_file[MAXPGPATH];

+ #if HAVE_GETRLIMIT
+ static void unlimit_core_size(void);
+ #endif
+

  #if defined(WIN32) || defined(__CYGWIN__)
  static void
***************
*** 478,483 ****
--- 490,516 ----
  }


+ #if HAVE_GETRLIMIT
+ static void
+ unlimit_core_size(void)
+ {
+     struct rlimit lim;
+     getrlimit(RLIMIT_CORE,&lim);
+     if (lim.rlim_max == 0)
+     {
+             write_stderr(_("%s: cannot set core size,: disallowed by hard limit.\n"),
+                          progname);
+             return;
+     }
+     else if (lim.rlim_max == RLIM_INFINITY || lim.rlim_cur < lim.rlim_max)
+     {
+         lim.rlim_cur = lim.rlim_max;
+         setrlimit(RLIMIT_CORE,&lim);
+     }
+ }
+ #endif
+
+

  static void
  do_start(void)
***************
*** 581,586 ****
--- 614,624 ----
          postgres_path = postmaster_path;
      }

+ #if HAVE_GETRLIMIT
+     if (allow_core_files)
+         unlimit_core_size();
+ #endif
+
      exitcode = start_postmaster();
      if (exitcode != 0)
      {
***************
*** 1401,1406 ****
--- 1439,1447 ----
      printf(_("  -o OPTIONS             command line options to pass to postgres\n"
               "                         (PostgreSQL server executable)\n"));
      printf(_("  -p PATH-TO-POSTGRES    normally not necessary\n"));
+ #if HAVE_GETRLIMIT
+     printf(_("  -c, --corefiles        allow postgres to produce core files\n"));
+ #endif

      printf(_("\nOptions for stop or restart:\n"));
      printf(_("  -m SHUTDOWN-MODE   may be \"smart\", \"fast\", or \"immediate\"\n"));
***************
*** 1497,1502 ****
--- 1538,1546 ----
          {"mode", required_argument, NULL, 'm'},
          {"pgdata", required_argument, NULL, 'D'},
          {"silent", no_argument, NULL, 's'},
+ #if HAVE_GETRLIMIT
+         {"corefiles", no_argument, NULL, 'c'},
+ #endif
          {NULL, 0, NULL, 0}
      };

***************
*** 1561,1567 ****
      /* process command-line options */
      while (optind < argc)
      {
!         while ((c = getopt_long(argc, argv, "D:l:m:N:o:p:P:sU:wW", long_options, &option_index)) != -1)
          {
              switch (c)
              {
--- 1605,1611 ----
      /* process command-line options */
      while (optind < argc)
      {
!         while ((c = getopt_long(argc, argv, "cD:l:m:N:o:p:P:sU:wW", long_options, &option_index)) != -1)
          {
              switch (c)
              {
***************
*** 1632,1637 ****
--- 1676,1686 ----
                      do_wait = false;
                      wait_set = true;
                      break;
+ #if HAVE_GETRLIMIT
+                 case 'c':
+                     allow_core_files = true;
+                     break;
+ #endif
                  default:
                      /* getopt_long already issued a suitable error message */
                      do_advice();
Index: src/test/regress/pg_regress.c
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/pg_regress.c,v
retrieving revision 1.23
diff -c -r1.23 pg_regress.c
*** src/test/regress/pg_regress.c    4 Oct 2006 00:30:14 -0000    1.23
--- src/test/regress/pg_regress.c    2 Jan 2007 20:25:03 -0000
***************
*** 24,29 ****
--- 24,34 ----
  #include <signal.h>
  #include <unistd.h>

+ #ifdef HAVE_SYS_RESOURCE_H
+ #include <sys/time.h>
+ #include <sys/resource.h>
+ #endif
+
  #include "getopt_long.h"
  #include "pg_config_paths.h"

***************
*** 122,127 ****
--- 127,156 ----
     the supplied arguments. */
  __attribute__((format(printf, 2, 3)));

+ /*
+  * allow core files if possible.
+  */
+ #if HAVE_GETRLIMIT
+ static void
+ unlimit_core_size(void)
+ {
+     struct rlimit lim;
+     getrlimit(RLIMIT_CORE,&lim);
+     if (lim.rlim_max == 0)
+     {
+         fprintf(stderr,
+                 _("%s: cannot set core size,: disallowed by hard limit.\n"),
+                 progname);
+         return;
+     }
+     else if (lim.rlim_max == RLIM_INFINITY || lim.rlim_cur < lim.rlim_max)
+     {
+         lim.rlim_cur = lim.rlim_max;
+         setrlimit(RLIMIT_CORE,&lim);
+     }
+ }
+ #endif
+

  /*
   * Add an item at the end of a stringlist.
***************
*** 1459,1464 ****
--- 1488,1497 ----

      initialize_environment();

+ #if HAVE_GETRLIMIT
+     unlimit_core_size();
+ #endif
+
      if (temp_install)
      {
          /*

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Revised patch attached, doing just this. I will apply it soon unless
> there are objections.

Probably a good idea to check defined(HAVE_GETRLIMIT) && defined(RLIMIT_CORE),
rather than naively assuming every getrlimit implementation supports
that particular setting.  Also, should the -c option exist but just not
do anything if the platform doesn't support it?  As is, you're making it
impossible to just specify -c without worrying if it does anything.

The documentation fails to list the long form of the switch
(--corefiles, which should probably really be --core-files for consistency).
There's a typo in this message, too:

+                 _("%s: cannot set core size,: disallowed by hard limit.\n"),

            regards, tom lane

Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD

From
Andrew Dunstan
Date:
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>
>> Revised patch attached, doing just this. I will apply it soon unless
>> there are objections.
>>
>
> Probably a good idea to check defined(HAVE_GETRLIMIT) && defined(RLIMIT_CORE),
> rather than naively assuming every getrlimit implementation supports
> that particular setting.  Also, should the -c option exist but just not
> do anything if the platform doesn't support it?  As is, you're making it
> impossible to just specify -c without worrying if it does anything.
>
> The documentation fails to list the long form of the switch
> (--corefiles, which should probably really be --core-files for consistency).
> There's a typo in this message, too:
>
> +                 _("%s: cannot set core size,: disallowed by hard limit.\n"),
>
>
>

OK, I'll fix all this. Thanks.

cheers

andrew