Thread: Trying to understand postgres crash
Hi, I'm running postgres 8.3.17 on a redhat linux mips derivative and I've observed a postgres crash and the subsequent messages in my postgres debug log. 2011-12-21 19:08:49 UTC - LOG: shutting down 2011-12-21 19:08:49 UTC - PANIC: concurrent transaction log activity while database system is shutting down 2011-12-21 19:08:49 UTC - LOG: background writer process (PID 1423) was terminated by signal 6: Aborted 2011-12-21 19:08:49 UTC - LOG: terminating any other active server processes I'm having trouble understanding what could cause this. I've looked through the postgres sources and I don't understand what is "shutting down" since this happens on a live system where no stop has been requested. Also It happens very infrequently and I don't yet know how to reproduce it. I'm happy to provide more information. thanks in advance for the help! -HH
On Wed, Dec 21, 2011 at 5:48 PM, Hanns Hartman <hwhartman@gmail.com> wrote: > Hi, > > I'm running postgres 8.3.17 on a redhat linux mips derivative and I've > observed a postgres crash and the subsequent messages in my postgres > debug log. > > 2011-12-21 19:08:49 UTC - LOG: shutting down > 2011-12-21 19:08:49 UTC -s PANIC: concurrent transaction log activity > while database system is shutting down > 2011-12-21 19:08:49 UTC - LOG: background writer process (PID 1423) > was terminated by signal 6: Aborted > 2011-12-21 19:08:49 UTC - LOG: terminating any other active server processes > > > I'm having trouble understanding what could cause this. I've looked > through the postgres sources and I don't understand what is "shutting > down" since this happens on a live system where no stop has been > requested. It says that it's detecting more than one process writing to the logs. Is there some way you're machine is launching a second postgresql instance against the same data store? It's just a guess, I've never seen this error before so It's a stab in the dark.
Hanns Hartman <hwhartman@gmail.com> writes: > I'm running postgres 8.3.17 on a redhat linux mips derivative and I've > observed a postgres crash and the subsequent messages in my postgres > debug log. > 2011-12-21 19:08:49 UTC - LOG: shutting down > 2011-12-21 19:08:49 UTC - PANIC: concurrent transaction log activity > while database system is shutting down > 2011-12-21 19:08:49 UTC - LOG: background writer process (PID 1423) > was terminated by signal 6: Aborted > 2011-12-21 19:08:49 UTC - LOG: terminating any other active server processes Best guess is that something sent the background writer process a SIGUSR2 signal. The first two messages indicate that it received such a signal, tried to shut down, and panicked when it noticed that other processes were still writing to xlog. The postmaster wouldn't (or at least shouldn't, and hasn't been observed to) send such a signal while it still has other live children, so I think you ought to look around for other possible sources of a signal. regards, tom lane
On Wed, Dec 21, 2011 at 8:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Wed, Dec 21, 2011 at 5:48 PM, Hanns Hartman <hwhartman@gmail.com> wrote: >> Hi, >> >> I'm running postgres 8.3.17 on a redhat linux mips derivative and I've >> observed a postgres crash and the subsequent messages in my postgres >> debug log. >> >> 2011-12-21 19:08:49 UTC - LOG: shutting down >> 2011-12-21 19:08:49 UTC -s PANIC: concurrent transaction log activity >> while database system is shutting down >> 2011-12-21 19:08:49 UTC - LOG: background writer process (PID 1423) >> was terminated by signal 6: Aborted >> 2011-12-21 19:08:49 UTC - LOG: terminating any other active server processes >> >> >> I'm having trouble understanding what could cause this. I've looked >> through the postgres sources and I don't understand what is "shutting >> down" since this happens on a live system where no stop has been >> requested. > > It says that it's detecting more than one process writing to the logs. > Is there some way you're machine is launching a second postgresql > instance against the same data store? It's just a guess, I've never > seen this error before so It's a stab in the dark. If I was running a second postgresql then I'd have some pretty huge problems! ;-) Thanks for the suggestion. -HH
On Thu, Dec 22, 2011 at 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hanns Hartman <hwhartman@gmail.com> writes: >> I'm running postgres 8.3.17 on a redhat linux mips derivative and I've >> observed a postgres crash and the subsequent messages in my postgres >> debug log. > >> 2011-12-21 19:08:49 UTC - LOG: shutting down >> 2011-12-21 19:08:49 UTC - PANIC: concurrent transaction log activity >> while database system is shutting down >> 2011-12-21 19:08:49 UTC - LOG: background writer process (PID 1423) >> was terminated by signal 6: Aborted >> 2011-12-21 19:08:49 UTC - LOG: terminating any other active server processes > > Best guess is that something sent the background writer process a > SIGUSR2 signal. The first two messages indicate that it received such > a signal, tried to shut down, and panicked when it noticed that other > processes were still writing to xlog. The postmaster wouldn't (or at > least shouldn't, and hasn't been observed to) send such a signal while > it still has other live children, so I think you ought to look around > for other possible sources of a signal. > > regards, tom lane Thanks for the advice. I'll look around our code base. One other thought, my current setup mainly interacts with postgresql via libpq and very rarely via psql. I have four connections open to postgresql, two of which are connected to one database and two connected to another. Could that set up or some bad usage of libpq be leading to this signal being generated? thanks -HH
Hanns Hartman <hwhartman@gmail.com> writes: > On Thu, Dec 22, 2011 at 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Best guess is that something sent the background writer process a >> SIGUSR2 signal. > One other thought, my current setup mainly interacts with postgresql > via libpq and very rarely via psql. I have four connections open to > postgresql, two of which are connected to one database and two > connected to another. Could that set up or some bad usage of libpq be > leading to this signal being generated? Shouldn't. You're not running the client-side code as the postgres user, are you? Barring a surprising kernel bug, only postgres-owned or root-owned processes could successfully issue kill(SIGUSR2) against the background writer process, so that set of processes is where you need to look. I'd first try getting rid of any operations that are running under the postgres account but don't really need to do so. regards, tom lane
On Thu, Dec 22, 2011 at 10:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hanns Hartman <hwhartman@gmail.com> writes: >> On Thu, Dec 22, 2011 at 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Best guess is that something sent the background writer process a >>> SIGUSR2 signal. > >> One other thought, my current setup mainly interacts with postgresql >> via libpq and very rarely via psql. I have four connections open to >> postgresql, two of which are connected to one database and two >> connected to another. Could that set up or some bad usage of libpq be >> leading to this signal being generated? > > Shouldn't. You're not running the client-side code as the postgres > user, are you? Barring a surprising kernel bug, only postgres-owned > or root-owned processes could successfully issue kill(SIGUSR2) against > the background writer process, so that set of processes is where you > need to look. I'd first try getting rid of any operations that are > running under the postgres account but don't really need to do so. > Nope the client side code is not running as the postgres user, but it is running as root. Unfortunately most daemons in our system run as root. Also a quick check or our code base yielded no usages of kill with SIGUSR2. We do however use kill with other signals. I will continue digging, but thanks a lot for the help! At least I can narrow down my search a bit ;-) -HH