Thread: stuck spinlock (0x2aac3678b0e0) detected at dynahash.c:876

stuck spinlock (0x2aac3678b0e0) detected at dynahash.c:876

From
Matt Solnit
Date:
Hi everyone.  The following error appeared in our log yesterday:

2009-11-19 13:39:40 PST:10.211.97.171(63815):[25668]: PANIC:  stuck spinlock (0x2aac3678b0e0) detected at
dynahash.c:876

Followed by:

2009-11-19 13:44:24 PST::@:[1381]: LOG:  server process (PID 25668) was terminated by signal 6: Aborted
2009-11-19 13:44:24 PST::@:[1381]: LOG:  terminating any other active server processes

Followed by:

2009-11-19 13:44:24 PST:10.211.1.171(8016):[29736]: WARNING:  terminating connection because of crash of another server
process
2009-11-19 13:44:24 PST:10.211.1.171(8016):[29736]: DETAIL:  The postmaster has commanded this server process to roll
backthe current transaction and exit, because another server process exited abnormally and possibly corrupted shared
memory.
(repeated for every open connection).

Followed by:

2009-11-19 13:44:36 PST:[local]:postgres@postgres:[29780]: FATAL:  the database system is in recovery mode
(repeated several times).

Finally:

2009-11-19 13:47:15 PST::@:[223331]: LOG:  autovacuum launcher started
2009-11-19 13:47:15 PST::@:[1381]: LOG:  database system is ready to accept connections

The system was under a good amount of load:  approximately 80 connections, each generating large numbers of batched
inserts(using JDBC).  This includes the one that crashed.  Once the database re-initialized, everything went back to
normal. Is this due to a bug in PostgreSQL?  Is there anything we can do about it? :-) 

We are running PostgreSQL 8.3.8 (64-bit) on a dedicated Fedora Core 8 machine, in Amazon EC2.  This was using an
"extra-large"instance, which means 4 Xeon cores (2.66 GHz) and 15.5 GB of memory. 

Sincerely,
Matt Solnit

Re: stuck spinlock (0x2aac3678b0e0) detected at dynahash.c:876

From
Merlin Moncure
Date:
On Fri, Nov 20, 2009 at 12:15 PM, Matt Solnit <msolnit@soasta.com> wrote:
>
> We are running PostgreSQL 8.3.8 (64-bit) on a dedicated Fedora Core 8 machine, in Amazon EC2.  This was using an
"extra-large"instance, which means 4 Xeon cores (2.66 GHz) and 15.5 GB of memory. 

considering that ec2 is a virtualized environment, the first
conclusion that everyone is going to jump to is that this is some type
of issue with ec2.  IIRC ec2 runs xen, did you search for any related
issues with xen and postgresql?

are you running the correct kernel?
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1535

"We strongly recommend using the 2.6.18 Xen stock kernel with the
c1.medium and c1.xlarge instances. Although the default Amazon EC2
kernels will work, the new kernels provide greater stability and
performance for these instance types. For more information about
kernels, refer to the Amazon Elastic Compute Cloud Developer Guide."

merlin

Re: stuck spinlock (0x2aac3678b0e0) detected at dynahash.c:876

From
Matt Solnit
Date:
Hi Merlin.  Thanks very much for your reply.  We are not using the "High-CPU" instance type, so these kernel
recommendationsto not apply to us.  Here is what we're running: 

$ uname -a
Linux domU-12-31-39-09-E8-21 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

Regarding Xen, I'll look around a bit, but one of my first Google hits was the following thread:
http://archives.postgresql.org/pgsql-general/2006-09/msg00503.php
Some quotes:
"PostgreSQL performs very, very well on Xen even in DomU. It is one of the things that lends to the Xen credibility
becausethey use us in their benchmarks." 
"... I've had zero issues running postgres inside a domU."

Granted, this was in 2006.

-- Matt

On Nov 20, 2009, at 9:54 AM, Merlin Moncure wrote:

> On Fri, Nov 20, 2009 at 12:15 PM, Matt Solnit <msolnit@soasta.com> wrote:
>>
>> We are running PostgreSQL 8.3.8 (64-bit) on a dedicated Fedora Core 8 machine, in Amazon EC2.  This was using an
"extra-large"instance, which means 4 Xeon cores (2.66 GHz) and 15.5 GB of memory. 
>
> considering that ec2 is a virtualized environment, the first
> conclusion that everyone is going to jump to is that this is some type
> of issue with ec2.  IIRC ec2 runs xen, did you search for any related
> issues with xen and postgresql?
>
> are you running the correct kernel?
> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1535
>
> "We strongly recommend using the 2.6.18 Xen stock kernel with the
> c1.medium and c1.xlarge instances. Although the default Amazon EC2
> kernels will work, the new kernels provide greater stability and
> performance for these instance types. For more information about
> kernels, refer to the Amazon Elastic Compute Cloud Developer Guide."
>
> merlin