Thread: Re: [HACKERS] Recent SIGSEGV failures in buildfarm HEAD
Stefan Kaltenbrunner wrote: > Tom Lane wrote: > >> Andrew Dunstan <andrew@dunslane.net> writes: >> >>> I'm actually wondering if unlimiting core might not be a useful switch >>> to provide on pg_ctl, as long as the platform has setrlimit(). >>> >> Not a bad thought; that's actually one of the reasons that I still >> usually use a handmade script rather than pg_ctl for launching >> postmasters ... >> > > this sounds like a good idea for me too - it seems like a cleaner and > more useful thing on a general base then just doing it in the buildfarm > code ... > > > Draft patch attached. However, there will be some more work to do. For one thing, pg_regress does not use pg_ctl to start its temp install postmaster, so either we'll need to train it the same way or get it to use pg_ctl. And then we'd need to change the regression makefile to use the option, based on an environment variable a bit like MAX_CONNEXCTIONS maybe. cheers andrew Index: src/bin/pg_ctl/pg_ctl.c =================================================================== RCS file: /cvsroot/pgsql/src/bin/pg_ctl/pg_ctl.c,v retrieving revision 1.74 diff -c -r1.74 pg_ctl.c *** src/bin/pg_ctl/pg_ctl.c 12 Oct 2006 05:14:49 -0000 1.74 --- src/bin/pg_ctl/pg_ctl.c 29 Dec 2006 21:08:39 -0000 *************** *** 26,31 **** --- 26,36 ---- #include <sys/stat.h> #include <unistd.h> + #ifdef HAVE_SYS_RESOURCE_H + #include <sys/time.h> + #include <sys/resource.h> + #endif + #include "libpq/pqsignal.h" #include "getopt_long.h" *************** *** 90,95 **** --- 95,103 ---- static char *register_username = NULL; static char *register_password = NULL; static char *argv0 = NULL; + #if HAVE_GETRLIMIT + static bool allow_core_files = false; + #endif static void write_stderr(const char *fmt,...) *************** *** 132,137 **** --- 140,149 ---- static char pid_file[MAXPGPATH]; static char conf_file[MAXPGPATH]; + #if HAVE_GETRLIMIT + static void unlimit_core_size(void); + #endif + #if defined(WIN32) || defined(__CYGWIN__) static void *************** *** 478,483 **** --- 490,516 ---- } + #if HAVE_GETRLIMIT + static void + unlimit_core_size(void) + { + struct rlimit lim; + getrlimit(RLIMIT_CORE,&lim); + if (lim.rlim_max == 0) + { + write_stderr(_("%s: cannot set core size,: disallowed by hard limit.\n"), + progname); + return; + } + else if (lim.rlim_max == RLIM_INFINITY || lim.rlim_cur < lim.rlim_max) + { + lim.rlim_cur = lim.rlim_max; + setrlimit(RLIMIT_CORE,&lim); + } + } + #endif + + static void do_start(void) *************** *** 581,586 **** --- 614,624 ---- postgres_path = postmaster_path; } + #if HAVE_GETRLIMIT + if (allow_core_files) + unlimit_core_size(); + #endif + exitcode = start_postmaster(); if (exitcode != 0) { *************** *** 1401,1406 **** --- 1439,1447 ---- printf(_(" -o OPTIONS command line options to pass to postgres\n" " (PostgreSQL server executable)\n")); printf(_(" -p PATH-TO-POSTGRES normally not necessary\n")); + #if HAVE_GETRLIMIT + printf(_(" -c, --corefiles allow postgres to produce core files\n")); + #endif printf(_("\nOptions for stop or restart:\n")); printf(_(" -m SHUTDOWN-MODE may be \"smart\", \"fast\", or \"immediate\"\n")); *************** *** 1497,1502 **** --- 1538,1546 ---- {"mode", required_argument, NULL, 'm'}, {"pgdata", required_argument, NULL, 'D'}, {"silent", no_argument, NULL, 's'}, + #if HAVE_GETRLIMIT + {"corefiles", no_argument, NULL, 'c'}, + #endif {NULL, 0, NULL, 0} }; *************** *** 1561,1567 **** /* process command-line options */ while (optind < argc) { ! while ((c = getopt_long(argc, argv, "D:l:m:N:o:p:P:sU:wW", long_options, &option_index)) != -1) { switch (c) { --- 1605,1611 ---- /* process command-line options */ while (optind < argc) { ! while ((c = getopt_long(argc, argv, "cD:l:m:N:o:p:P:sU:wW", long_options, &option_index)) != -1) { switch (c) { *************** *** 1632,1637 **** --- 1676,1686 ---- do_wait = false; wait_set = true; break; + #if HAVE_GETRLIMIT + case 'c': + allow_core_files = true; + break; + #endif default: /* getopt_long already issued a suitable error message */ do_advice();
Andrew Dunstan <andrew@dunslane.net> writes: > ... And then we'd need to change the regression makefile to use > the option, based on an environment variable a bit like MAX_CONNEXCTIONS > maybe. Why wouldn't we just use it always? If a regression test dumps core, that's going to deserve investigation. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> ... And then we'd need to change the regression makefile to use >> the option, based on an environment variable a bit like MAX_CONNEXCTIONS >> maybe. > > Why wouldn't we just use it always? If a regression test dumps core, > that's going to deserve investigation. enabling it always for the regression tests probably makes sense - but there is also the possibility that such a core can get very large and potentially run the partitition the regression test runs on out of space. Stefan
Stefan Kaltenbrunner wrote: > Tom Lane wrote: >> Andrew Dunstan <andrew@dunslane.net> writes: >>> ... And then we'd need to change the regression makefile to use >>> the option, based on an environment variable a bit like >>> MAX_CONNEXCTIONS >>> maybe. >> >> Why wouldn't we just use it always? If a regression test dumps core, >> that's going to deserve investigation. > > enabling it always for the regression tests probably makes sense - but > there is also the possibility that such a core can get very large and > potentially run the partitition the regression test runs on out of space. > > I think Tom is right. You can always set the hard limit before calling "make check" or running the buildfarm script. I'll prepare a patch to use similar code unconditionally in pg_regress. cheers andrew
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> ... And then we'd need to change the regression makefile to use >> the option, based on an environment variable a bit like MAX_CONNEXCTIONS >> maybe. >> > > Why wouldn't we just use it always? If a regression test dumps core, > that's going to deserve investigation. > > > Revised patch attached, doing just this. I will apply it soon unless there are objections. cheers andrew Index: doc/src/sgml/ref/pg_ctl-ref.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/pg_ctl-ref.sgml,v retrieving revision 1.35 diff -c -r1.35 pg_ctl-ref.sgml *** doc/src/sgml/ref/pg_ctl-ref.sgml 2 Dec 2006 00:34:52 -0000 1.35 --- doc/src/sgml/ref/pg_ctl-ref.sgml 2 Jan 2007 20:25:01 -0000 *************** *** 29,34 **** --- 29,35 ---- <arg>-l <replaceable>filename</replaceable></arg> <arg>-o <replaceable>options</replaceable></arg> <arg>-p <replaceable>path</replaceable></arg> + <arg>-c</arg> <sbr> <command>pg_ctl</command> <arg choice="plain">stop</arg> *************** *** 48,53 **** --- 49,55 ---- <arg>-w</arg> <arg>-s</arg> <arg>-D <replaceable>datadir</replaceable></arg> + <arg>-c</arg> <arg>-m <group choice="plain"> <arg>s[mart]</arg> *************** *** 246,251 **** --- 248,266 ---- </varlistentry> <varlistentry> + <term><option>-c</option></term> + <listitem> + <para> + Attempt to allow server crashes to produce core files, on platforms + where this available, by lifting any soft resource limit placed on + them. + This is useful in debugging or diagnosing problems by allowing a + stack trace to be obtained from a failed server process. + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><option>-w</option></term> <listitem> <para> Index: src/bin/pg_ctl/pg_ctl.c =================================================================== RCS file: /cvsroot/pgsql/src/bin/pg_ctl/pg_ctl.c,v retrieving revision 1.74 diff -c -r1.74 pg_ctl.c *** src/bin/pg_ctl/pg_ctl.c 12 Oct 2006 05:14:49 -0000 1.74 --- src/bin/pg_ctl/pg_ctl.c 2 Jan 2007 20:25:02 -0000 *************** *** 26,31 **** --- 26,36 ---- #include <sys/stat.h> #include <unistd.h> + #ifdef HAVE_SYS_RESOURCE_H + #include <sys/time.h> + #include <sys/resource.h> + #endif + #include "libpq/pqsignal.h" #include "getopt_long.h" *************** *** 90,95 **** --- 95,103 ---- static char *register_username = NULL; static char *register_password = NULL; static char *argv0 = NULL; + #if HAVE_GETRLIMIT + static bool allow_core_files = false; + #endif static void write_stderr(const char *fmt,...) *************** *** 132,137 **** --- 140,149 ---- static char pid_file[MAXPGPATH]; static char conf_file[MAXPGPATH]; + #if HAVE_GETRLIMIT + static void unlimit_core_size(void); + #endif + #if defined(WIN32) || defined(__CYGWIN__) static void *************** *** 478,483 **** --- 490,516 ---- } + #if HAVE_GETRLIMIT + static void + unlimit_core_size(void) + { + struct rlimit lim; + getrlimit(RLIMIT_CORE,&lim); + if (lim.rlim_max == 0) + { + write_stderr(_("%s: cannot set core size,: disallowed by hard limit.\n"), + progname); + return; + } + else if (lim.rlim_max == RLIM_INFINITY || lim.rlim_cur < lim.rlim_max) + { + lim.rlim_cur = lim.rlim_max; + setrlimit(RLIMIT_CORE,&lim); + } + } + #endif + + static void do_start(void) *************** *** 581,586 **** --- 614,624 ---- postgres_path = postmaster_path; } + #if HAVE_GETRLIMIT + if (allow_core_files) + unlimit_core_size(); + #endif + exitcode = start_postmaster(); if (exitcode != 0) { *************** *** 1401,1406 **** --- 1439,1447 ---- printf(_(" -o OPTIONS command line options to pass to postgres\n" " (PostgreSQL server executable)\n")); printf(_(" -p PATH-TO-POSTGRES normally not necessary\n")); + #if HAVE_GETRLIMIT + printf(_(" -c, --corefiles allow postgres to produce core files\n")); + #endif printf(_("\nOptions for stop or restart:\n")); printf(_(" -m SHUTDOWN-MODE may be \"smart\", \"fast\", or \"immediate\"\n")); *************** *** 1497,1502 **** --- 1538,1546 ---- {"mode", required_argument, NULL, 'm'}, {"pgdata", required_argument, NULL, 'D'}, {"silent", no_argument, NULL, 's'}, + #if HAVE_GETRLIMIT + {"corefiles", no_argument, NULL, 'c'}, + #endif {NULL, 0, NULL, 0} }; *************** *** 1561,1567 **** /* process command-line options */ while (optind < argc) { ! while ((c = getopt_long(argc, argv, "D:l:m:N:o:p:P:sU:wW", long_options, &option_index)) != -1) { switch (c) { --- 1605,1611 ---- /* process command-line options */ while (optind < argc) { ! while ((c = getopt_long(argc, argv, "cD:l:m:N:o:p:P:sU:wW", long_options, &option_index)) != -1) { switch (c) { *************** *** 1632,1637 **** --- 1676,1686 ---- do_wait = false; wait_set = true; break; + #if HAVE_GETRLIMIT + case 'c': + allow_core_files = true; + break; + #endif default: /* getopt_long already issued a suitable error message */ do_advice(); Index: src/test/regress/pg_regress.c =================================================================== RCS file: /cvsroot/pgsql/src/test/regress/pg_regress.c,v retrieving revision 1.23 diff -c -r1.23 pg_regress.c *** src/test/regress/pg_regress.c 4 Oct 2006 00:30:14 -0000 1.23 --- src/test/regress/pg_regress.c 2 Jan 2007 20:25:03 -0000 *************** *** 24,29 **** --- 24,34 ---- #include <signal.h> #include <unistd.h> + #ifdef HAVE_SYS_RESOURCE_H + #include <sys/time.h> + #include <sys/resource.h> + #endif + #include "getopt_long.h" #include "pg_config_paths.h" *************** *** 122,127 **** --- 127,156 ---- the supplied arguments. */ __attribute__((format(printf, 2, 3))); + /* + * allow core files if possible. + */ + #if HAVE_GETRLIMIT + static void + unlimit_core_size(void) + { + struct rlimit lim; + getrlimit(RLIMIT_CORE,&lim); + if (lim.rlim_max == 0) + { + fprintf(stderr, + _("%s: cannot set core size,: disallowed by hard limit.\n"), + progname); + return; + } + else if (lim.rlim_max == RLIM_INFINITY || lim.rlim_cur < lim.rlim_max) + { + lim.rlim_cur = lim.rlim_max; + setrlimit(RLIMIT_CORE,&lim); + } + } + #endif + /* * Add an item at the end of a stringlist. *************** *** 1459,1464 **** --- 1488,1497 ---- initialize_environment(); + #if HAVE_GETRLIMIT + unlimit_core_size(); + #endif + if (temp_install) { /*
Andrew Dunstan <andrew@dunslane.net> writes: > Revised patch attached, doing just this. I will apply it soon unless > there are objections. Probably a good idea to check defined(HAVE_GETRLIMIT) && defined(RLIMIT_CORE), rather than naively assuming every getrlimit implementation supports that particular setting. Also, should the -c option exist but just not do anything if the platform doesn't support it? As is, you're making it impossible to just specify -c without worrying if it does anything. The documentation fails to list the long form of the switch (--corefiles, which should probably really be --core-files for consistency). There's a typo in this message, too: + _("%s: cannot set core size,: disallowed by hard limit.\n"), regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Revised patch attached, doing just this. I will apply it soon unless >> there are objections. >> > > Probably a good idea to check defined(HAVE_GETRLIMIT) && defined(RLIMIT_CORE), > rather than naively assuming every getrlimit implementation supports > that particular setting. Also, should the -c option exist but just not > do anything if the platform doesn't support it? As is, you're making it > impossible to just specify -c without worrying if it does anything. > > The documentation fails to list the long form of the switch > (--corefiles, which should probably really be --core-files for consistency). > There's a typo in this message, too: > > + _("%s: cannot set core size,: disallowed by hard limit.\n"), > > > OK, I'll fix all this. Thanks. cheers andrew