Thread: Sigh, my old HPUX box is totally broken by DSM patch
initdb.c quoth: * ... but the fact that a platform has shm_open* doesn't guarantee that that call will succeed when attempted. Indeed: $ initdb The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "C". The default database encoding has accordingly been set to "SQL_ASCII". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory /home/postgres/testversion/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting dynamic shared memory implementation ... Bad system call(coredump) $ gdb shows: Core was generated by `initdb'. Program terminated with signal 12, Bad system call. (gdb) bt #0 0xc0143fb0 in ?? () from /usr/lib/libc.1 #1 0xa890 in choose_dsm_implementation () at initdb.c:1098 #2 0xab10 in test_config_settings () at initdb.c:1217 #3 0xe310 in initialize_data_directory () at initdb.c:3412 #4 0xed0c in main (argc=1, argv=0x7b03ac68) at initdb.c:3691 #5 0xc0065784 in ?? () from /usr/lib/libc.1 I'm not entirely sure what to do about this. Conceivably we could have initdb catch SIGSYS, but that seems rather ugly. Maybe configure needs a run-time test to see if shm_open will work, rather than just probing to see if such a function exists? I'm not thrilled with run-time tests in configure though. Another possibility is for initdb to execute the probe in a forked subprocess instead of risking doing it itself. regards, tom lane
On Tue, Oct 22, 2013 at 10:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > initdb.c quoth: > > * ... but the fact that a platform has shm_open > * doesn't guarantee that that call will succeed when attempted. > > Indeed: > > $ initdb > The files belonging to this database system will be owned by user "postgres". > This user must also own the server process. > > The database cluster will be initialized with locale "C". > The default database encoding has accordingly been set to "SQL_ASCII". > The default text search configuration will be set to "english". > > Data page checksums are disabled. > > creating directory /home/postgres/testversion/data ... ok > creating subdirectories ... ok > selecting default max_connections ... 100 > selecting default shared_buffers ... 128MB > selecting dynamic shared memory implementation ... Bad system call(coredump) > $ > > gdb shows: > > Core was generated by `initdb'. > Program terminated with signal 12, Bad system call. > (gdb) bt > #0 0xc0143fb0 in ?? () from /usr/lib/libc.1 > #1 0xa890 in choose_dsm_implementation () at initdb.c:1098 > #2 0xab10 in test_config_settings () at initdb.c:1217 > #3 0xe310 in initialize_data_directory () at initdb.c:3412 > #4 0xed0c in main (argc=1, argv=0x7b03ac68) at initdb.c:3691 > #5 0xc0065784 in ?? () from /usr/lib/libc.1 > > I'm not entirely sure what to do about this. Conceivably we could have > initdb catch SIGSYS, but that seems rather ugly. Maybe configure needs a > run-time test to see if shm_open will work, rather than just probing to > see if such a function exists? I'm not thrilled with run-time tests in > configure though. Another possibility is for initdb to execute the > probe in a forked subprocess instead of risking doing it itself. Well, geez. That's obnoxious. I understand that an unimplemented system call might return ENOSYS, but SIGSYS seems pretty unfriendly. Why put the wrapper in your system libraries at all if it's just going to kill the process? I don't think a configure-time test is a good idea because there's no guarantee that the configure-time machine and the run-time machine have the same behavior. But having initdb fork a child process to run the test seems like a reasonable way forward, even though I feel like it shouldn't really be needed. One possibly unfortunate things is that SIGSYS at least on my box normally produces a core dump, so the initdb child might leave behind a core file somewhere as a side effect. Not sure if we can or want to work around that somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Oct 22, 2013 at 10:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> selecting dynamic shared memory implementation ... Bad system call(coredump) > Well, geez. That's obnoxious. I quite agree :-(. If it were just this old HPUX version, maybe we could write it off as something we don't care to support anymore. I'm worried though that there might be other platforms that act this way. regards, tom lane
* Robert Haas (robertmhaas@gmail.com) wrote: > I don't think a configure-time test is a good idea because there's no > guarantee that the configure-time machine and the run-time machine > have the same behavior. But having initdb fork a child process to run > the test seems like a reasonable way forward, even though I feel like > it shouldn't really be needed. One possibly unfortunate things is > that SIGSYS at least on my box normally produces a core dump, so the > initdb child might leave behind a core file somewhere as a side > effect. Not sure if we can or want to work around that somehow. I'm going to guess this idea is a non-starter, but any hope there's some other system call which would tell us we're on a platform where shm_open() will hit us with SIGSYS? What happens when shm_unlink() is called on this platform? Or mmap()? Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > I'm going to guess this idea is a non-starter, but any hope there's some > other system call which would tell us we're on a platform where > shm_open() will hit us with SIGSYS? What happens when shm_unlink() is > called on this platform? Or mmap()? For context's sake, the machine does have mmap(). shm_open and shm_unlink exist in libc and have declarations in <sys/mman.h>, but at least the former traps with a signal, suggesting the kernel hasn't got support for it. I agree with Robert that it's odd and obnoxious that the call doesn't just return with errno = ENOSYS. However, looking in the archives turns up this interesting historical info: http://www.postgresql.org/message-id/25564.962066659@sss.pgh.pa.us I wonder whether, if we went back to blocking SIGSYS, we could expect that affected calls would return ENOSYS (clearly preferable), or if that would just lead to some very strange behavior. Other archive entries mention that you get SIGSYS on Cygwin if the Cygwin support daemon isn't running, so that's at least one place where we'd want to check the behavior. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > I agree with Robert that it's odd and obnoxious that the call doesn't just > return with errno = ENOSYS. However, looking in the archives turns up > this interesting historical info: > http://www.postgresql.org/message-id/25564.962066659@sss.pgh.pa.us Wow, well, good on HPUX for trying to run the code you told it to.. > I wonder whether, if we went back to blocking SIGSYS, we could expect that > affected calls would return ENOSYS (clearly preferable), or if that would > just lead to some very strange behavior. Other archive entries mention > that you get SIGSYS on Cygwin if the Cygwin support daemon isn't running, > so that's at least one place where we'd want to check the behavior. Would this make sense as a configure-time check, rather than initdb, to try blocking SIGSYS and checking for an ENOSYS from shm_open()? Seems preferrable to do that in a configure check rather than initdb. Thanks, Stephen
On Wed, Oct 23, 2013 at 11:35 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> I agree with Robert that it's odd and obnoxious that the call doesn't just >> return with errno = ENOSYS. However, looking in the archives turns up >> this interesting historical info: >> http://www.postgresql.org/message-id/25564.962066659@sss.pgh.pa.us > > Wow, well, good on HPUX for trying to run the code you told it to.. > >> I wonder whether, if we went back to blocking SIGSYS, we could expect that >> affected calls would return ENOSYS (clearly preferable), or if that would >> just lead to some very strange behavior. Other archive entries mention >> that you get SIGSYS on Cygwin if the Cygwin support daemon isn't running, >> so that's at least one place where we'd want to check the behavior. > > Would this make sense as a configure-time check, rather than initdb, to > try blocking SIGSYS and checking for an ENOSYS from shm_open()? Seems > preferrable to do that in a configure check rather than initdb. I don't see why. It's a run-time behavior; the build system may not be where the binaries will ultimately run. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Oct 23, 2013 at 11:35 AM, Stephen Frost <sfrost@snowman.net> wrote: > > Would this make sense as a configure-time check, rather than initdb, to > > try blocking SIGSYS and checking for an ENOSYS from shm_open()? Seems > > preferrable to do that in a configure check rather than initdb. > > I don't see why. It's a run-time behavior; the build system may not > be where the binaries will ultimately run. I suppose, just need to be more cautious when blocking signals in initdb than in a configure-time check, of course. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Wed, Oct 23, 2013 at 11:35 AM, Stephen Frost <sfrost@snowman.net> wrote: >>> Would this make sense as a configure-time check, rather than initdb, to >>> try blocking SIGSYS and checking for an ENOSYS from shm_open()? Seems >>> preferrable to do that in a configure check rather than initdb. >> I don't see why. It's a run-time behavior; the build system may not >> be where the binaries will ultimately run. > I suppose, just need to be more cautious when blocking signals in initdb > than in a configure-time check, of course. Indeed, telling initdb to ignore SIGSYS makes it do what we want on this box: $ git diff diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 3983b23731330b78a66a74d14faaf76f7aff85c2..05252df869d128ac2cf3b1c48c6259d6d95b0ffc 100644 *** a/src/bin/initdb/initdb.c --- b/src/bin/initdb/initdb.c *************** setup_signals(void) *** 3197,3202 **** --- 3197,3207 ---- #ifdef SIGPIPE pqsignal(SIGPIPE, SIG_IGN); #endif + + /* Prevent SIGSYS so we can probe for kernel calls that might not work */ + #ifdef SIGSYS + pqsignal(SIGSYS, SIG_IGN); + #endif } $ initdb ... selecting dynamic shared memory implementation ... sysv ... The above patch ignores SIGSYS throughout initdb. We could narrow the possible side-effects by only disabling SIGSYS around the shm_open call, but I'm not sure there's any value in that. It seems likely to me that the same kind of problem might pop up elsewhere in future, as we try to make use of other modern kernel facilities. In fact, I can foresee wanting to run the whole backend this way --- though I'm not proposing doing so today. A bit of googling turned up the following paragraph of rationale in the POSIX spec (Open Group Base Specifications 2013 edition): There is very little that a Conforming POSIX.1 Application can do by catching, ignoring, or masking any of the signals SIGILL,SIGTRAP, SIGIOT, SIGEMT, SIGBUS, SIGSEGV, SIGSYS, or SIGFPE. They will generally be generated by the system only incases of programming errors. While it may be desirable for some robust code (for example, a library routine) to be ableto detect and recover from programming errors in other code, these signals are not nearly sufficient for that purpose.One portable use that does exist for these signals is that a command interpreter can recognize them as the causeof termination of a process (with wait()) and print an appropriate message. So in other words, the reason for delivering SIGSYS rather than returning ENOSYS by default is to make it apparent from the process exit code that you made an invalid kernel call, should your code be sloppy enough that it fails to notice and report kernel call failures. This argument doesn't seem to me to hold a lot of water for Postgres' purposes. Comments? regards, tom lane
On 2013-10-24 13:13:23 -0400, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > The above patch ignores SIGSYS throughout initdb. We could narrow the > possible side-effects by only disabling SIGSYS around the shm_open call, > but I'm not sure there's any value in that. It seems likely to me that > the same kind of problem might pop up elsewhere in future, as we try > to make use of other modern kernel facilities. In fact, I can foresee > wanting to run the whole backend this way --- though I'm not proposing > doing so today. Why not? I don't see the advantage of looking for effects/problems of such a chance twice. I'd also much rather see a wrongly configured postgres fail to start with a legible error message instead of it being killed by a signal. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Oct 24, 2013 at 1:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Stephen Frost <sfrost@snowman.net> writes: >> * Robert Haas (robertmhaas@gmail.com) wrote: >>> On Wed, Oct 23, 2013 at 11:35 AM, Stephen Frost <sfrost@snowman.net> wrote: >>>> Would this make sense as a configure-time check, rather than initdb, to >>>> try blocking SIGSYS and checking for an ENOSYS from shm_open()? Seems >>>> preferrable to do that in a configure check rather than initdb. > >>> I don't see why. It's a run-time behavior; the build system may not >>> be where the binaries will ultimately run. > >> I suppose, just need to be more cautious when blocking signals in initdb >> than in a configure-time check, of course. > > Indeed, telling initdb to ignore SIGSYS makes it do what we want on > this box: > > $ git diff > diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c > index 3983b23731330b78a66a74d14faaf76f7aff85c2..05252df869d128ac2cf3b1c48c6259d6d95b0ffc 100644 > *** a/src/bin/initdb/initdb.c > --- b/src/bin/initdb/initdb.c > *************** setup_signals(void) > *** 3197,3202 **** > --- 3197,3207 ---- > #ifdef SIGPIPE > pqsignal(SIGPIPE, SIG_IGN); > #endif > + > + /* Prevent SIGSYS so we can probe for kernel calls that might not work */ > + #ifdef SIGSYS > + pqsignal(SIGSYS, SIG_IGN); > + #endif > } > > > $ initdb > ... > selecting dynamic shared memory implementation ... sysv > ... > > The above patch ignores SIGSYS throughout initdb. We could narrow the > possible side-effects by only disabling SIGSYS around the shm_open call, > but I'm not sure there's any value in that. It seems likely to me that > the same kind of problem might pop up elsewhere in future, as we try > to make use of other modern kernel facilities. In fact, I can foresee > wanting to run the whole backend this way --- though I'm not proposing > doing so today. > > A bit of googling turned up the following paragraph of rationale in the > POSIX spec (Open Group Base Specifications 2013 edition): > > There is very little that a Conforming POSIX.1 Application can do by > catching, ignoring, or masking any of the signals SIGILL, SIGTRAP, > SIGIOT, SIGEMT, SIGBUS, SIGSEGV, SIGSYS, or SIGFPE. They will generally > be generated by the system only in cases of programming errors. While it > may be desirable for some robust code (for example, a library routine) > to be able to detect and recover from programming errors in other code, > these signals are not nearly sufficient for that purpose. One portable > use that does exist for these signals is that a command interpreter can > recognize them as the cause of termination of a process (with wait()) > and print an appropriate message. > > So in other words, the reason for delivering SIGSYS rather than returning > ENOSYS by default is to make it apparent from the process exit code that > you made an invalid kernel call, should your code be sloppy enough that > it fails to notice and report kernel call failures. This argument doesn't > seem to me to hold a lot of water for Postgres' purposes. > > Comments? Your proposed change to initdb seems fine to me. If we change initdb but not the backend, then somebody could later manually change postgresql.conf to set dynamic_shared_memory_type=posix. When they try to restart the postmaster, it'll die with SIGSYS rather than exiting with a relatively clean error message. However, at the moment, it seems like the only people who are likely to encounter that situation are those who install PostgreSQL 9.4 on very old HP-UX boxen and then change the configuration settings chosen by initdb, and there shouldn't be many such people. Therefore I tend to think that changing initdb is sufficient for now; we can take the risk of changing the backend's handling of SIGSYS if and when it becomes clear that there's enough benefit to doing so to justify the risk. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company