Home > mailing lists

Re: [ADMIN] Streaming Replication Server Crash - Mailing list pgsql-general

From	Craig Ringer
Subject	Re: [ADMIN] Streaming Replication Server Crash
Date	October 23, 2012 05:03:48
Msg-id	50862525.5060904@ringerc.id.au Whole thread Raw
In response to	Re: [ADMIN] Streaming Replication Server Crash (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [ADMIN] Streaming Replication Server Crash
List	pgsql-general

Tree view

On 10/22/2012 08:52 PM, Tom Lane wrote:
> Craig Ringer <ringerc@ringerc.id.au> writes:
>> On 10/19/2012 04:40 PM, raghu ram wrote:
>>> 2012-10-19 12:26:46 IST [1338]: [18-1] user=,db= LOG:  server process
>>> (PID 15565) was terminated by signal 10
>
>> That's odd. SIGUSR1 (signal 10) shouldn't terminate PostgreSQL.
>
>> Was the server intentionally sent SIGUSR1 by an admin? Do you know what
>> triggered the signal?
>
> SIGUSR1 is used for all sorts of internal cross-process signaling
> purposes.  There's no need to hypothesize any external force sending
> it; if somebody had broken a PG process's signal handling setup for
> SIGUSR1, a crash of this sort could be expected in short order.
>
> But having said that, are we sure 10 is SIGUSR1 on the OP's platform?
> AFAIK, that signal number is not at all compatible across different
> flavors of Unix.  (I see SIGUSR1 is 30 on OS X for instance.)

Gah. I incorrectly though that POSIX specified signal *numbers*, not
just names. That does not appear to actually be the case. Thanks.

A bit of searching suggests that on Solaris/SunOS, signal 10 is SIGBUS:

http://www.s-gms.ms.edus.si/cgi-bin/man-cgi?signal+3HEAD
http://docs.oracle.com/cd/E23824_01/html/821-1464/signal-3head.html

... which tends to suggest an entirely different interpretation than
"someone broke a signal hander":

https://blogs.oracle.com/peteh/entry/sigbus_versus_sigsegv_according_to

such as:

- Bad mmap()ed read
- alignment error
- hardware fault

so it's not immensely different to a segfault in that it can be caused
by errors in hardware, OS, or applications.

Raghu, did PostgreSQL dump a core file? If it didn't, you might want to
enable core dumps in future. If it did dump a core, attaching a debugger
to the core file might tell you where it crashed, possibly offering some
more information to diagnose the issue. I'm not familiar enough with
Solaris to offer detailed advice on that, especially as you haven't
mentioned your Solaris version, how you installed Pg, etc. This may be
of some use:

http://stackoverflow.com/questions/6403803/how-to-get-backtrace-function-line-number-on-solaris

--
Craig Ringer

pgsql-general by date:

From: Scott Marlowe
Date: 22 October 2012, 22:51:44
Subject: Re: Plug-pull testing worked, diskchecker.pl failed

From: Tom Lane
Date: 23 October 2012, 05:20:43
Subject: Re: [ADMIN] Streaming Replication Server Crash

Re: [ADMIN] Streaming Replication Server Crash - Mailing list pgsql-general

Previous

Next