Thread: urgent: upgraded to 8.2, getting kernel panics

urgent: upgraded to 8.2, getting kernel panics

From
"Merlin Moncure"
Date:
Ok,

This may the wrong place to look for answers to this, but I figured it
couldn't hurt...so here goes:

On friday we upgraded a critical backend server to postgresql 8.2
running on fedora core 4.  Since then we have received three kernel
panics during periods of moderate to high load (twice during the
pg_dump backup run).

Platform is IBM x360 series running SCSI, software raid on the backplane.

After the first crash we yum updated the system which obviously did
not fix the problem.  I was leaning hardware problem until this last
time and I was able to catch the following off the terminal:

BUG: spinlock recursion CPU0 postmaster...not tainted.
bunch of other stuff ending in:
Kernel Panic: not syncing: Bad locking

One of the other developers snapped a picture of the kernel panic with
his digital camera and is going to send over the pictures when he gets
home this evening.

Has anybody seen any problem like this or have any suggestions about
possible resolution...should I be posting to the LKML?  Any
suggestions are welcome and appreciated.

At this juncture we are going to downgrade the postmaster back to 8.1
and see if that fixes the panics.  If it doesn't this discussion is
over but if it does we are extremely curious about looking for a fix
for this issue...we have about 8 weeks of development that is on hold
until we can put a 8.2 server in production.  Management has already
authorized a new server but they want a 100% guarantee this is going
to fix the problem.

thanks in advance,
merlin

Re: urgent: upgraded to 8.2, getting kernel panics

From
Devrim GUNDUZ
Date:
On Fri, 2007-02-23 at 17:14 -0500, Merlin Moncure wrote:

> BUG: spinlock recursion CPU0 postmaster...not tainted.

<snip>

> Has anybody seen any problem like this or have any suggestions about
> possible resolution...should I be posting to the LKML?

AFAIR (+ some quick Googling), this is related to a problem in kernel.
You may need to update to a newer Fedora release since FC4 is not
supported anymore :(.

Even if you report to LKML, they will probably suggest you using a newer
kernel. However, I think system will not let you compile a new kernel
and panic again during a high load... So...

If you have a free space, install a newer Fedora release on this system,
mount the existing $PGDATA and try if this fixes the problem...
--
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/



Attachment

Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics

From
Tom Lane
Date:
"Merlin Moncure" <mmoncure@gmail.com> writes:
> On friday we upgraded a critical backend server to postgresql 8.2
> running on fedora core 4.

Umm ... why that particular choice of OS?  Red Hat dropped update
support for FC4 some time ago, and AFAIK the Fedora Legacy project
is not getting things done.  How old is the kernel you're using?

> At this juncture we are going to downgrade the postmaster back to 8.1
> and see if that fixes the panics.

Even assuming that Postgres is related to the panics, I don't think you
will find anyone maintaining that a kernel panic is not the kernel's
problem.  If an application *is* able to provoke a kernel panic, the
standard description of the problem would be "critical kernel security
flaw".

            regards, tom lane

Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics

From
"CAJ CAJ"
Date:


On 2/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Merlin Moncure" <mmoncure@gmail.com> writes:
> On friday we upgraded a critical backend server to postgresql 8.2
> running on fedora core 4.

Umm ... why that particular choice of OS?  Red Hat dropped update
support for FC4 some time ago, and AFAIK the Fedora Legacy project
is not getting things done.  How old is the kernel you're using?

> At this juncture we are going to downgrade the postmaster back to 8.1
> and see if that fixes the panics.

Even assuming that Postgres is related to the panics, I don't think you
will find anyone maintaining that a kernel panic is not the kernel's
problem.  If an application *is* able to provoke a kernel panic, the
standard description of the problem would be "critical kernel security
flaw".

I vaguely remember running into spinlock problems with FC4 and it wasn't due to PostgreSQL. We didn't have database running on FC4.

If you are running a critical server you should switch to atleast CentOS.

Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics

From
"Merlin Moncure"
Date:
On 2/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
> > On friday we upgraded a critical backend server to postgresql 8.2
> > running on fedora core 4.
>
> Umm ... why that particular choice of OS?  Red Hat dropped update
> support for FC4 some time ago, and AFAIK the Fedora Legacy project
> is not getting things done.  How old is the kernel you're using?

Linux mojo 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:57:02 EDT 2006
i686 i686 i386 GNU/Linux


Unfortunately, the decision about which kernel to run is more or less
out of my hands.  I would personally really dislike fedora and would
much prefer to be running centos/redhat as.  That said, your comments
and those of others are very helpul in regards to fixing that.

we tried update to the latest via yum update with no help.

as promised, here is the  best photo of the panic we could get:
http://img144.imageshack.us/my.php?image=dumpic6.jpg

We did an emergency downgrade to 8.1 and will monitor the
situation...the decision to get a new server has already been made and
hopefully it will be on a more stable platform.

big thanks to all who took a few minutes out of their day to lend a hand.

merlin

Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics

From
Devrim GUNDUZ
Date:
Hi,

On Mon, 2007-02-26 at 08:24 -0500, Merlin Moncure wrote:
> we tried update to the latest via yum update with no help.

As Tom stated, FC4 is no more supported; therefore you won't be able to
get newer kernel via yum.

> as promised, here is the  best photo of the panic we could get:
> http://img144.imageshack.us/my.php?image=dumpic6.jpg

...bad locking...

The picture reminded me a SCSI driver bug in older kernels -- I google'd
again now and I saw a post that says "native drivers are being used in
FC5+ kernels". If this is the real case, you may hit the problem
sometime later.

Upgrading OS will probably solve your problem; since there is no way to
upgrade FC4 kernel unless you want to compile kernel source on your
system.

Regards,

--
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/



Attachment

Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics

From
Bruno Wolff III
Date:
On Fri, Feb 23, 2007 at 18:14:25 -0500,
  Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
> > On friday we upgraded a critical backend server to postgresql 8.2
> > running on fedora core 4.
>
> Umm ... why that particular choice of OS?  Red Hat dropped update
> support for FC4 some time ago, and AFAIK the Fedora Legacy project
> is not getting things done.  How old is the kernel you're using?

The Fedora Legacy project is officially gone now.

Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics

From
Bruno Wolff III
Date:
On Mon, Feb 26, 2007 at 15:57:02 +0200,
  Devrim GUNDUZ <devrim@CommandPrompt.com> wrote:
>
> Upgrading OS will probably solve your problem; since there is no way to
> upgrade FC4 kernel unless you want to compile kernel source on your
> system.

And good luck with that. Fedora still back patches stuff from later kernels
than the one you think you have based on the name. Building a Linus kernel
and getting the right mix of versions to work on a particular version of
Fedora might be hard to do. If you can find the patch that fixes the
problem, your best bet (assuming you have to use FC4) would be to try to apply
that fix to the latest Fedora kernel for FC4.

Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics

From
"Merlin Moncure"
Date:
On 2/26/07, Merlin Moncure <mmoncure@gmail.com> wrote:
> On 2/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > "Merlin Moncure" <mmoncure@gmail.com> writes:
> > > On friday we upgraded a critical backend server to postgresql 8.2
> > > running on fedora core 4.
> >
> > Umm ... why that particular choice of OS?  Red Hat dropped update
> > support for FC4 some time ago, and AFAIK the Fedora Legacy project
> > is not getting things done.  How old is the kernel you're using?
>
> Linux mojo 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:57:02 EDT 2006
> i686 i686 i386 GNU/Linux
>
>
> Unfortunately, the decision about which kernel to run is more or less
> out of my hands.  I would personally really dislike fedora and would
> much prefer to be running centos/redhat as.  That said, your comments
> and those of others are very helpul in regards to fixing that.
>
> we tried update to the latest via yum update with no help.
>
> as promised, here is the  best photo of the panic we could get:
> http://img144.imageshack.us/my.php?image=dumpic6.jpg
>
> We did an emergency downgrade to 8.1 and will monitor the
> situation...the decision to get a new server has already been made and
> hopefully it will be on a more stable platform.
>
> big thanks to all who took a few minutes out of their day to lend a hand.

Following an emergency downgrade back to 8.1, the kernel panics went
away.  Note that I don't believe for a second that the database was
the root cause here...research suggest that the problem is due to some
type of bug in the scsi driver.  Exactly why 8.2 brings this out is a
mystery...working on getting an enterprise kernel on the server.

merlin

Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics

From
Magnus Hagander
Date:
On Thu, Mar 01, 2007 at 08:32:57AM -0500, Merlin Moncure wrote:
>
> Following an emergency downgrade back to 8.1, the kernel panics went
> away.  Note that I don't believe for a second that the database was
> the root cause here...research suggest that the problem is due to some
> type of bug in the scsi driver.  Exactly why 8.2 brings this out is a
> mystery...working on getting an enterprise kernel on the server.

Probably it's pushing some part of the I/O system harder than 8.1, thus
exposing the bug faster.

//Magnus