Thread: Sigh, my old HPUX box is totally broken by DSM patch

Sigh, my old HPUX box is totally broken by DSM patch

From
Tom Lane
Date:
initdb.c quoth:
* ... but the fact that a platform has shm_open* doesn't guarantee that that call will succeed when attempted.

Indeed:

$ initdb
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C".
The default database encoding has accordingly been set to "SQL_ASCII".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /home/postgres/testversion/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... Bad system call(coredump)
$ 

gdb shows:

Core was generated by `initdb'.
Program terminated with signal 12, Bad system call.
(gdb) bt
#0  0xc0143fb0 in ?? () from /usr/lib/libc.1
#1  0xa890 in choose_dsm_implementation () at initdb.c:1098
#2  0xab10 in test_config_settings () at initdb.c:1217
#3  0xe310 in initialize_data_directory () at initdb.c:3412
#4  0xed0c in main (argc=1, argv=0x7b03ac68) at initdb.c:3691
#5  0xc0065784 in ?? () from /usr/lib/libc.1

I'm not entirely sure what to do about this.  Conceivably we could have
initdb catch SIGSYS, but that seems rather ugly.  Maybe configure needs a
run-time test to see if shm_open will work, rather than just probing to
see if such a function exists?  I'm not thrilled with run-time tests in
configure though.  Another possibility is for initdb to execute the
probe in a forked subprocess instead of risking doing it itself.
        regards, tom lane



Re: Sigh, my old HPUX box is totally broken by DSM patch

From
Robert Haas
Date:
On Tue, Oct 22, 2013 at 10:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> initdb.c quoth:
>
>  * ... but the fact that a platform has shm_open
>  * doesn't guarantee that that call will succeed when attempted.
>
> Indeed:
>
> $ initdb
> The files belonging to this database system will be owned by user "postgres".
> This user must also own the server process.
>
> The database cluster will be initialized with locale "C".
> The default database encoding has accordingly been set to "SQL_ASCII".
> The default text search configuration will be set to "english".
>
> Data page checksums are disabled.
>
> creating directory /home/postgres/testversion/data ... ok
> creating subdirectories ... ok
> selecting default max_connections ... 100
> selecting default shared_buffers ... 128MB
> selecting dynamic shared memory implementation ... Bad system call(coredump)
> $
>
> gdb shows:
>
> Core was generated by `initdb'.
> Program terminated with signal 12, Bad system call.
> (gdb) bt
> #0  0xc0143fb0 in ?? () from /usr/lib/libc.1
> #1  0xa890 in choose_dsm_implementation () at initdb.c:1098
> #2  0xab10 in test_config_settings () at initdb.c:1217
> #3  0xe310 in initialize_data_directory () at initdb.c:3412
> #4  0xed0c in main (argc=1, argv=0x7b03ac68) at initdb.c:3691
> #5  0xc0065784 in ?? () from /usr/lib/libc.1
>
> I'm not entirely sure what to do about this.  Conceivably we could have
> initdb catch SIGSYS, but that seems rather ugly.  Maybe configure needs a
> run-time test to see if shm_open will work, rather than just probing to
> see if such a function exists?  I'm not thrilled with run-time tests in
> configure though.  Another possibility is for initdb to execute the
> probe in a forked subprocess instead of risking doing it itself.

Well, geez.  That's obnoxious.  I understand that an unimplemented
system call might return ENOSYS, but SIGSYS seems pretty unfriendly.
Why put the wrapper in your system libraries at all if it's just going
to kill the process?

I don't think a configure-time test is a good idea because there's no
guarantee that the configure-time machine and the run-time machine
have the same behavior.  But having initdb fork a child process to run
the test seems like a reasonable way forward, even though I feel like
it shouldn't really be needed.  One possibly unfortunate things is
that SIGSYS at least on my box normally produces a core dump, so the
initdb child might leave behind a core file somewhere as a side
effect.  Not sure if we can or want to work around that somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Sigh, my old HPUX box is totally broken by DSM patch

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Oct 22, 2013 at 10:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> selecting dynamic shared memory implementation ... Bad system call(coredump)

> Well, geez.  That's obnoxious.

I quite agree :-(.  If it were just this old HPUX version, maybe we could
write it off as something we don't care to support anymore.  I'm worried
though that there might be other platforms that act this way.
        regards, tom lane



Re: Sigh, my old HPUX box is totally broken by DSM patch

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> I don't think a configure-time test is a good idea because there's no
> guarantee that the configure-time machine and the run-time machine
> have the same behavior.  But having initdb fork a child process to run
> the test seems like a reasonable way forward, even though I feel like
> it shouldn't really be needed.  One possibly unfortunate things is
> that SIGSYS at least on my box normally produces a core dump, so the
> initdb child might leave behind a core file somewhere as a side
> effect.  Not sure if we can or want to work around that somehow.

I'm going to guess this idea is a non-starter, but any hope there's some
other system call which would tell us we're on a platform where
shm_open() will hit us with SIGSYS?  What happens when shm_unlink() is
called on this platform?  Or mmap()?
Thanks,
    Stephen

Re: Sigh, my old HPUX box is totally broken by DSM patch

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> I'm going to guess this idea is a non-starter, but any hope there's some
> other system call which would tell us we're on a platform where
> shm_open() will hit us with SIGSYS?  What happens when shm_unlink() is
> called on this platform?  Or mmap()?

For context's sake, the machine does have mmap().  shm_open and shm_unlink
exist in libc and have declarations in <sys/mman.h>, but at least the
former traps with a signal, suggesting the kernel hasn't got support for
it.

I agree with Robert that it's odd and obnoxious that the call doesn't just
return with errno = ENOSYS.  However, looking in the archives turns up
this interesting historical info:
http://www.postgresql.org/message-id/25564.962066659@sss.pgh.pa.us

I wonder whether, if we went back to blocking SIGSYS, we could expect that
affected calls would return ENOSYS (clearly preferable), or if that would
just lead to some very strange behavior.  Other archive entries mention
that you get SIGSYS on Cygwin if the Cygwin support daemon isn't running,
so that's at least one place where we'd want to check the behavior.
        regards, tom lane



Re: Sigh, my old HPUX box is totally broken by DSM patch

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I agree with Robert that it's odd and obnoxious that the call doesn't just
> return with errno = ENOSYS.  However, looking in the archives turns up
> this interesting historical info:
> http://www.postgresql.org/message-id/25564.962066659@sss.pgh.pa.us

Wow, well, good on HPUX for trying to run the code you told it to..

> I wonder whether, if we went back to blocking SIGSYS, we could expect that
> affected calls would return ENOSYS (clearly preferable), or if that would
> just lead to some very strange behavior.  Other archive entries mention
> that you get SIGSYS on Cygwin if the Cygwin support daemon isn't running,
> so that's at least one place where we'd want to check the behavior.

Would this make sense as a configure-time check, rather than initdb, to
try blocking SIGSYS and checking for an ENOSYS from shm_open()?  Seems
preferrable to do that in a configure check rather than initdb.
Thanks,
    Stephen

Re: Sigh, my old HPUX box is totally broken by DSM patch

From
Robert Haas
Date:
On Wed, Oct 23, 2013 at 11:35 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I agree with Robert that it's odd and obnoxious that the call doesn't just
>> return with errno = ENOSYS.  However, looking in the archives turns up
>> this interesting historical info:
>> http://www.postgresql.org/message-id/25564.962066659@sss.pgh.pa.us
>
> Wow, well, good on HPUX for trying to run the code you told it to..
>
>> I wonder whether, if we went back to blocking SIGSYS, we could expect that
>> affected calls would return ENOSYS (clearly preferable), or if that would
>> just lead to some very strange behavior.  Other archive entries mention
>> that you get SIGSYS on Cygwin if the Cygwin support daemon isn't running,
>> so that's at least one place where we'd want to check the behavior.
>
> Would this make sense as a configure-time check, rather than initdb, to
> try blocking SIGSYS and checking for an ENOSYS from shm_open()?  Seems
> preferrable to do that in a configure check rather than initdb.

I don't see why.  It's a run-time behavior; the build system may not
be where the binaries will ultimately run.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Sigh, my old HPUX box is totally broken by DSM patch

From
Stephen Frost
Date:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Oct 23, 2013 at 11:35 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > Would this make sense as a configure-time check, rather than initdb, to
> > try blocking SIGSYS and checking for an ENOSYS from shm_open()?  Seems
> > preferrable to do that in a configure check rather than initdb.
>
> I don't see why.  It's a run-time behavior; the build system may not
> be where the binaries will ultimately run.

I suppose, just need to be more cautious when blocking signals in initdb
than in a configure-time check, of course.
Thanks,
    Stephen

Re: Sigh, my old HPUX box is totally broken by DSM patch

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Wed, Oct 23, 2013 at 11:35 AM, Stephen Frost <sfrost@snowman.net> wrote:
>>> Would this make sense as a configure-time check, rather than initdb, to
>>> try blocking SIGSYS and checking for an ENOSYS from shm_open()?  Seems
>>> preferrable to do that in a configure check rather than initdb.

>> I don't see why.  It's a run-time behavior; the build system may not
>> be where the binaries will ultimately run.

> I suppose, just need to be more cautious when blocking signals in initdb
> than in a configure-time check, of course.

Indeed, telling initdb to ignore SIGSYS makes it do what we want on
this box:

$ git diff
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 3983b23731330b78a66a74d14faaf76f7aff85c2..05252df869d128ac2cf3b1c48c6259d6d95b0ffc 100644
*** a/src/bin/initdb/initdb.c
--- b/src/bin/initdb/initdb.c
*************** setup_signals(void)
*** 3197,3202 ****
--- 3197,3207 ---- #ifdef SIGPIPE   pqsignal(SIGPIPE, SIG_IGN); #endif
+ 
+   /* Prevent SIGSYS so we can probe for kernel calls that might not work */
+ #ifdef SIGSYS
+   pqsignal(SIGSYS, SIG_IGN);
+ #endif }  
$ initdb
...
selecting dynamic shared memory implementation ... sysv
...

The above patch ignores SIGSYS throughout initdb.  We could narrow the
possible side-effects by only disabling SIGSYS around the shm_open call,
but I'm not sure there's any value in that.  It seems likely to me that
the same kind of problem might pop up elsewhere in future, as we try
to make use of other modern kernel facilities.  In fact, I can foresee
wanting to run the whole backend this way --- though I'm not proposing
doing so today.

A bit of googling turned up the following paragraph of rationale in the
POSIX spec (Open Group Base Specifications 2013 edition):
 There is very little that a Conforming POSIX.1 Application can do by catching, ignoring, or masking any of the signals
SIGILL,SIGTRAP, SIGIOT, SIGEMT, SIGBUS, SIGSEGV, SIGSYS, or SIGFPE. They will generally be generated by the system only
incases of programming errors. While it may be desirable for some robust code (for example, a library routine) to be
ableto detect and recover from programming errors in other code, these signals are not nearly sufficient for that
purpose.One portable use that does exist for these signals is that a command interpreter can recognize them as the
causeof termination of a process (with wait()) and print an appropriate message.
 

So in other words, the reason for delivering SIGSYS rather than returning
ENOSYS by default is to make it apparent from the process exit code that
you made an invalid kernel call, should your code be sloppy enough that
it fails to notice and report kernel call failures.  This argument doesn't
seem to me to hold a lot of water for Postgres' purposes.

Comments?
        regards, tom lane



Re: Sigh, my old HPUX box is totally broken by DSM patch

From
Andres Freund
Date:
On 2013-10-24 13:13:23 -0400, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> The above patch ignores SIGSYS throughout initdb.  We could narrow the
> possible side-effects by only disabling SIGSYS around the shm_open call,
> but I'm not sure there's any value in that.  It seems likely to me that
> the same kind of problem might pop up elsewhere in future, as we try
> to make use of other modern kernel facilities.  In fact, I can foresee
> wanting to run the whole backend this way --- though I'm not proposing
> doing so today.

Why not? I don't see the advantage of looking for effects/problems of
such a chance twice.

I'd also much rather see a wrongly configured postgres fail to start
with a legible error message instead of it being killed by a signal.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Sigh, my old HPUX box is totally broken by DSM patch

From
Robert Haas
Date:
On Thu, Oct 24, 2013 at 1:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> * Robert Haas (robertmhaas@gmail.com) wrote:
>>> On Wed, Oct 23, 2013 at 11:35 AM, Stephen Frost <sfrost@snowman.net> wrote:
>>>> Would this make sense as a configure-time check, rather than initdb, to
>>>> try blocking SIGSYS and checking for an ENOSYS from shm_open()?  Seems
>>>> preferrable to do that in a configure check rather than initdb.
>
>>> I don't see why.  It's a run-time behavior; the build system may not
>>> be where the binaries will ultimately run.
>
>> I suppose, just need to be more cautious when blocking signals in initdb
>> than in a configure-time check, of course.
>
> Indeed, telling initdb to ignore SIGSYS makes it do what we want on
> this box:
>
> $ git diff
> diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
> index 3983b23731330b78a66a74d14faaf76f7aff85c2..05252df869d128ac2cf3b1c48c6259d6d95b0ffc 100644
> *** a/src/bin/initdb/initdb.c
> --- b/src/bin/initdb/initdb.c
> *************** setup_signals(void)
> *** 3197,3202 ****
> --- 3197,3207 ----
>   #ifdef SIGPIPE
>     pqsignal(SIGPIPE, SIG_IGN);
>   #endif
> +
> +   /* Prevent SIGSYS so we can probe for kernel calls that might not work */
> + #ifdef SIGSYS
> +   pqsignal(SIGSYS, SIG_IGN);
> + #endif
>   }
>
>
> $ initdb
> ...
> selecting dynamic shared memory implementation ... sysv
> ...
>
> The above patch ignores SIGSYS throughout initdb.  We could narrow the
> possible side-effects by only disabling SIGSYS around the shm_open call,
> but I'm not sure there's any value in that.  It seems likely to me that
> the same kind of problem might pop up elsewhere in future, as we try
> to make use of other modern kernel facilities.  In fact, I can foresee
> wanting to run the whole backend this way --- though I'm not proposing
> doing so today.
>
> A bit of googling turned up the following paragraph of rationale in the
> POSIX spec (Open Group Base Specifications 2013 edition):
>
>   There is very little that a Conforming POSIX.1 Application can do by
>   catching, ignoring, or masking any of the signals SIGILL, SIGTRAP,
>   SIGIOT, SIGEMT, SIGBUS, SIGSEGV, SIGSYS, or SIGFPE. They will generally
>   be generated by the system only in cases of programming errors. While it
>   may be desirable for some robust code (for example, a library routine)
>   to be able to detect and recover from programming errors in other code,
>   these signals are not nearly sufficient for that purpose. One portable
>   use that does exist for these signals is that a command interpreter can
>   recognize them as the cause of termination of a process (with wait())
>   and print an appropriate message.
>
> So in other words, the reason for delivering SIGSYS rather than returning
> ENOSYS by default is to make it apparent from the process exit code that
> you made an invalid kernel call, should your code be sloppy enough that
> it fails to notice and report kernel call failures.  This argument doesn't
> seem to me to hold a lot of water for Postgres' purposes.
>
> Comments?

Your proposed change to initdb seems fine to me.

If we change initdb but not the backend, then somebody could later
manually change postgresql.conf to set
dynamic_shared_memory_type=posix.  When they try to restart the
postmaster, it'll die with SIGSYS rather than exiting with a
relatively clean error message.  However, at the moment, it seems like
the only people who are likely to encounter that situation are those
who install PostgreSQL 9.4 on very old HP-UX boxen and then change the
configuration settings chosen by initdb, and there shouldn't be many
such people.  Therefore I tend to think that changing initdb is
sufficient for now; we can take the risk of changing the backend's
handling of SIGSYS if and when it becomes clear that there's enough
benefit to doing so to justify the risk.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company