Thread: urgent: upgraded to 8.2, getting kernel panics
Ok, This may the wrong place to look for answers to this, but I figured it couldn't hurt...so here goes: On friday we upgraded a critical backend server to postgresql 8.2 running on fedora core 4. Since then we have received three kernel panics during periods of moderate to high load (twice during the pg_dump backup run). Platform is IBM x360 series running SCSI, software raid on the backplane. After the first crash we yum updated the system which obviously did not fix the problem. I was leaning hardware problem until this last time and I was able to catch the following off the terminal: BUG: spinlock recursion CPU0 postmaster...not tainted. bunch of other stuff ending in: Kernel Panic: not syncing: Bad locking One of the other developers snapped a picture of the kernel panic with his digital camera and is going to send over the pictures when he gets home this evening. Has anybody seen any problem like this or have any suggestions about possible resolution...should I be posting to the LKML? Any suggestions are welcome and appreciated. At this juncture we are going to downgrade the postmaster back to 8.1 and see if that fixes the panics. If it doesn't this discussion is over but if it does we are extremely curious about looking for a fix for this issue...we have about 8 weeks of development that is on hold until we can put a 8.2 server in production. Management has already authorized a new server but they want a 100% guarantee this is going to fix the problem. thanks in advance, merlin
On Fri, 2007-02-23 at 17:14 -0500, Merlin Moncure wrote: > BUG: spinlock recursion CPU0 postmaster...not tainted. <snip> > Has anybody seen any problem like this or have any suggestions about > possible resolution...should I be posting to the LKML? AFAIR (+ some quick Googling), this is related to a problem in kernel. You may need to update to a newer Fedora release since FC4 is not supported anymore :(. Even if you report to LKML, they will probably suggest you using a newer kernel. However, I think system will not let you compile a new kernel and panic again during a high load... So... If you have a free space, install a newer Fedora release on this system, mount the existing $PGDATA and try if this fixes the problem... -- Devrim GÜNDÜZ PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/
Attachment
"Merlin Moncure" <mmoncure@gmail.com> writes: > On friday we upgraded a critical backend server to postgresql 8.2 > running on fedora core 4. Umm ... why that particular choice of OS? Red Hat dropped update support for FC4 some time ago, and AFAIK the Fedora Legacy project is not getting things done. How old is the kernel you're using? > At this juncture we are going to downgrade the postmaster back to 8.1 > and see if that fixes the panics. Even assuming that Postgres is related to the panics, I don't think you will find anyone maintaining that a kernel panic is not the kernel's problem. If an application *is* able to provoke a kernel panic, the standard description of the problem would be "critical kernel security flaw". regards, tom lane
On 2/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I vaguely remember running into spinlock problems with FC4 and it wasn't due to PostgreSQL. We didn't have database running on FC4.
If you are running a critical server you should switch to atleast CentOS.
"Merlin Moncure" <mmoncure@gmail.com> writes:
> On friday we upgraded a critical backend server to postgresql 8.2
> running on fedora core 4.
Umm ... why that particular choice of OS? Red Hat dropped update
support for FC4 some time ago, and AFAIK the Fedora Legacy project
is not getting things done. How old is the kernel you're using?
> At this juncture we are going to downgrade the postmaster back to 8.1
> and see if that fixes the panics.
Even assuming that Postgres is related to the panics, I don't think you
will find anyone maintaining that a kernel panic is not the kernel's
problem. If an application *is* able to provoke a kernel panic, the
standard description of the problem would be "critical kernel security
flaw".
I vaguely remember running into spinlock problems with FC4 and it wasn't due to PostgreSQL. We didn't have database running on FC4.
If you are running a critical server you should switch to atleast CentOS.
On 2/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Merlin Moncure" <mmoncure@gmail.com> writes: > > On friday we upgraded a critical backend server to postgresql 8.2 > > running on fedora core 4. > > Umm ... why that particular choice of OS? Red Hat dropped update > support for FC4 some time ago, and AFAIK the Fedora Legacy project > is not getting things done. How old is the kernel you're using? Linux mojo 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:57:02 EDT 2006 i686 i686 i386 GNU/Linux Unfortunately, the decision about which kernel to run is more or less out of my hands. I would personally really dislike fedora and would much prefer to be running centos/redhat as. That said, your comments and those of others are very helpul in regards to fixing that. we tried update to the latest via yum update with no help. as promised, here is the best photo of the panic we could get: http://img144.imageshack.us/my.php?image=dumpic6.jpg We did an emergency downgrade to 8.1 and will monitor the situation...the decision to get a new server has already been made and hopefully it will be on a more stable platform. big thanks to all who took a few minutes out of their day to lend a hand. merlin
Hi, On Mon, 2007-02-26 at 08:24 -0500, Merlin Moncure wrote: > we tried update to the latest via yum update with no help. As Tom stated, FC4 is no more supported; therefore you won't be able to get newer kernel via yum. > as promised, here is the best photo of the panic we could get: > http://img144.imageshack.us/my.php?image=dumpic6.jpg ...bad locking... The picture reminded me a SCSI driver bug in older kernels -- I google'd again now and I saw a post that says "native drivers are being used in FC5+ kernels". If this is the real case, you may hit the problem sometime later. Upgrading OS will probably solve your problem; since there is no way to upgrade FC4 kernel unless you want to compile kernel source on your system. Regards, -- Devrim GÜNDÜZ PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/
Attachment
On Fri, Feb 23, 2007 at 18:14:25 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Merlin Moncure" <mmoncure@gmail.com> writes: > > On friday we upgraded a critical backend server to postgresql 8.2 > > running on fedora core 4. > > Umm ... why that particular choice of OS? Red Hat dropped update > support for FC4 some time ago, and AFAIK the Fedora Legacy project > is not getting things done. How old is the kernel you're using? The Fedora Legacy project is officially gone now.
On Mon, Feb 26, 2007 at 15:57:02 +0200, Devrim GUNDUZ <devrim@CommandPrompt.com> wrote: > > Upgrading OS will probably solve your problem; since there is no way to > upgrade FC4 kernel unless you want to compile kernel source on your > system. And good luck with that. Fedora still back patches stuff from later kernels than the one you think you have based on the name. Building a Linus kernel and getting the right mix of versions to work on a particular version of Fedora might be hard to do. If you can find the patch that fixes the problem, your best bet (assuming you have to use FC4) would be to try to apply that fix to the latest Fedora kernel for FC4.
On 2/26/07, Merlin Moncure <mmoncure@gmail.com> wrote: > On 2/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Merlin Moncure" <mmoncure@gmail.com> writes: > > > On friday we upgraded a critical backend server to postgresql 8.2 > > > running on fedora core 4. > > > > Umm ... why that particular choice of OS? Red Hat dropped update > > support for FC4 some time ago, and AFAIK the Fedora Legacy project > > is not getting things done. How old is the kernel you're using? > > Linux mojo 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:57:02 EDT 2006 > i686 i686 i386 GNU/Linux > > > Unfortunately, the decision about which kernel to run is more or less > out of my hands. I would personally really dislike fedora and would > much prefer to be running centos/redhat as. That said, your comments > and those of others are very helpul in regards to fixing that. > > we tried update to the latest via yum update with no help. > > as promised, here is the best photo of the panic we could get: > http://img144.imageshack.us/my.php?image=dumpic6.jpg > > We did an emergency downgrade to 8.1 and will monitor the > situation...the decision to get a new server has already been made and > hopefully it will be on a more stable platform. > > big thanks to all who took a few minutes out of their day to lend a hand. Following an emergency downgrade back to 8.1, the kernel panics went away. Note that I don't believe for a second that the database was the root cause here...research suggest that the problem is due to some type of bug in the scsi driver. Exactly why 8.2 brings this out is a mystery...working on getting an enterprise kernel on the server. merlin
On Thu, Mar 01, 2007 at 08:32:57AM -0500, Merlin Moncure wrote: > > Following an emergency downgrade back to 8.1, the kernel panics went > away. Note that I don't believe for a second that the database was > the root cause here...research suggest that the problem is due to some > type of bug in the scsi driver. Exactly why 8.2 brings this out is a > mystery...working on getting an enterprise kernel on the server. Probably it's pushing some part of the I/O system harder than 8.1, thus exposing the bug faster. //Magnus