Thread: Trying to understand postgres crash

Trying to understand postgres crash

From
Hanns Hartman
Date:
Hi,

I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
observed a postgres crash and the subsequent messages in my postgres
debug log.

2011-12-21 19:08:49 UTC - LOG:  shutting down
2011-12-21 19:08:49 UTC - PANIC:  concurrent transaction log activity
while database system is shutting down
2011-12-21 19:08:49 UTC - LOG:  background writer process (PID 1423)
was terminated by signal 6: Aborted
2011-12-21 19:08:49 UTC - LOG:  terminating any other active server processes


I'm having trouble understanding what could cause this.  I've looked
through the postgres sources and I don't understand what is "shutting
down" since this happens on a live system where no stop has been
requested.

Also It happens very infrequently and I don't yet know how to reproduce it.

I'm happy to provide more information.

thanks in advance for the help!
-HH

Re: Trying to understand postgres crash

From
Scott Marlowe
Date:
On Wed, Dec 21, 2011 at 5:48 PM, Hanns Hartman <hwhartman@gmail.com> wrote:
> Hi,
>
> I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
> observed a postgres crash and the subsequent messages in my postgres
> debug log.
>
> 2011-12-21 19:08:49 UTC - LOG:  shutting down
> 2011-12-21 19:08:49 UTC -s PANIC:  concurrent transaction log activity
> while database system is shutting down
> 2011-12-21 19:08:49 UTC - LOG:  background writer process (PID 1423)
> was terminated by signal 6: Aborted
> 2011-12-21 19:08:49 UTC - LOG:  terminating any other active server processes
>
>
> I'm having trouble understanding what could cause this.  I've looked
> through the postgres sources and I don't understand what is "shutting
> down" since this happens on a live system where no stop has been
> requested.

It says that it's detecting more than one process writing to the logs.
 Is there some way you're machine is launching a second postgresql
instance against the same data store? It's just a guess, I've never
seen this error before so It's a stab in the dark.

Re: Trying to understand postgres crash

From
Tom Lane
Date:
Hanns Hartman <hwhartman@gmail.com> writes:
> I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
> observed a postgres crash and the subsequent messages in my postgres
> debug log.

> 2011-12-21 19:08:49 UTC - LOG:  shutting down
> 2011-12-21 19:08:49 UTC - PANIC:  concurrent transaction log activity
> while database system is shutting down
> 2011-12-21 19:08:49 UTC - LOG:  background writer process (PID 1423)
> was terminated by signal 6: Aborted
> 2011-12-21 19:08:49 UTC - LOG:  terminating any other active server processes

Best guess is that something sent the background writer process a
SIGUSR2 signal.  The first two messages indicate that it received such
a signal, tried to shut down, and panicked when it noticed that other
processes were still writing to xlog.  The postmaster wouldn't (or at
least shouldn't, and hasn't been observed to) send such a signal while
it still has other live children, so I think you ought to look around
for other possible sources of a signal.

            regards, tom lane

Re: Trying to understand postgres crash

From
Hanns Hartman
Date:
On Wed, Dec 21, 2011 at 8:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Dec 21, 2011 at 5:48 PM, Hanns Hartman <hwhartman@gmail.com> wrote:
>> Hi,
>>
>> I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
>> observed a postgres crash and the subsequent messages in my postgres
>> debug log.
>>
>> 2011-12-21 19:08:49 UTC - LOG:  shutting down
>> 2011-12-21 19:08:49 UTC -s PANIC:  concurrent transaction log activity
>> while database system is shutting down
>> 2011-12-21 19:08:49 UTC - LOG:  background writer process (PID 1423)
>> was terminated by signal 6: Aborted
>> 2011-12-21 19:08:49 UTC - LOG:  terminating any other active server processes
>>
>>
>> I'm having trouble understanding what could cause this.  I've looked
>> through the postgres sources and I don't understand what is "shutting
>> down" since this happens on a live system where no stop has been
>> requested.
>
> It says that it's detecting more than one process writing to the logs.
>  Is there some way you're machine is launching a second postgresql
> instance against the same data store? It's just a guess, I've never
> seen this error before so It's a stab in the dark.

If I was running a second postgresql then I'd have some pretty huge
problems! ;-)
Thanks for the suggestion.
-HH

Re: Trying to understand postgres crash

From
Hanns Hartman
Date:
On Thu, Dec 22, 2011 at 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hanns Hartman <hwhartman@gmail.com> writes:
>> I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
>> observed a postgres crash and the subsequent messages in my postgres
>> debug log.
>
>> 2011-12-21 19:08:49 UTC - LOG:  shutting down
>> 2011-12-21 19:08:49 UTC - PANIC:  concurrent transaction log activity
>> while database system is shutting down
>> 2011-12-21 19:08:49 UTC - LOG:  background writer process (PID 1423)
>> was terminated by signal 6: Aborted
>> 2011-12-21 19:08:49 UTC - LOG:  terminating any other active server processes
>
> Best guess is that something sent the background writer process a
> SIGUSR2 signal.  The first two messages indicate that it received such
> a signal, tried to shut down, and panicked when it noticed that other
> processes were still writing to xlog.  The postmaster wouldn't (or at
> least shouldn't, and hasn't been observed to) send such a signal while
> it still has other live children, so I think you ought to look around
> for other possible sources of a signal.
>
>                        regards, tom lane

Thanks for the advice.  I'll look around our code base.

One other thought, my current setup mainly interacts with postgresql
via libpq and very rarely via psql.  I have four connections open to
postgresql, two of which are connected to one database and two
connected to another.  Could that set up or some bad usage of libpq be
leading to this signal being generated?

thanks
-HH

Re: Trying to understand postgres crash

From
Tom Lane
Date:
Hanns Hartman <hwhartman@gmail.com> writes:
> On Thu, Dec 22, 2011 at 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Best guess is that something sent the background writer process a
>> SIGUSR2 signal.

> One other thought, my current setup mainly interacts with postgresql
> via libpq and very rarely via psql.  I have four connections open to
> postgresql, two of which are connected to one database and two
> connected to another.  Could that set up or some bad usage of libpq be
> leading to this signal being generated?

Shouldn't.  You're not running the client-side code as the postgres
user, are you?  Barring a surprising kernel bug, only postgres-owned
or root-owned processes could successfully issue kill(SIGUSR2) against
the background writer process, so that set of processes is where you
need to look.  I'd first try getting rid of any operations that are
running under the postgres account but don't really need to do so.

            regards, tom lane

Re: Trying to understand postgres crash

From
Hanns Hartman
Date:
On Thu, Dec 22, 2011 at 10:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hanns Hartman <hwhartman@gmail.com> writes:
>> On Thu, Dec 22, 2011 at 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Best guess is that something sent the background writer process a
>>> SIGUSR2 signal.
>
>> One other thought, my current setup mainly interacts with postgresql
>> via libpq and very rarely via psql.  I have four connections open to
>> postgresql, two of which are connected to one database and two
>> connected to another.  Could that set up or some bad usage of libpq be
>> leading to this signal being generated?
>
> Shouldn't.  You're not running the client-side code as the postgres
> user, are you?  Barring a surprising kernel bug, only postgres-owned
> or root-owned processes could successfully issue kill(SIGUSR2) against
> the background writer process, so that set of processes is where you
> need to look.  I'd first try getting rid of any operations that are
> running under the postgres account but don't really need to do so.
>

Nope the client side code is not running as the postgres user, but it
is running as root.
Unfortunately most daemons in our system run as root.  Also a quick
check or our code base yielded no usages of kill with SIGUSR2.  We do
however use kill with other signals.

I will continue digging, but thanks a lot for the help!  At least I
can narrow down my search a bit ;-)
-HH