Thread: FATAL: stuck spinlock

FATAL: stuck spinlock

From
Jakub Ouhrabka
Date:
hi,

i've found these messages in my server log:

FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting.
FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting.
FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting.
FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting.

after these messages all backends were restarted (closed connections to
clients). now everything seems to be ok. what does it mean?

thanks,         kuba


Re: FATAL: stuck spinlock

From
Tom Lane
Date:
Jakub Ouhrabka <jouh8664@ss1000.ms.mff.cuni.cz> writes:
> i've found these messages in my server log:

> FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting.

Oh?  That shouldn't happen.  More details please?

            regards, tom lane

Re: FATAL: stuck spinlock

From
Jakub Ouhrabka
Date:
> > FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting.
>
> Oh?  That shouldn't happen.  More details please?

I don't know what is interesting for you but I'll try...

There is one central database and 10 other databases, let's call them
applications' databases. There are 4 daemons which are receiving messages
from outside, inserting them into the central database then they look
(select) if there are messages for them to send, if so, then send it and
update the sent row. All 4 daemons are are working with two same tables:
one for received messages and one for messages to be sent, there is one
fileds in both tables which determininies for/from which daemon is that
row... These daemons are working in infinite loops: try to receive max 100
messages from outside and then try to send max 100, so there are no
notifications, triggers, etc...
And then, there are 10 other daemons, one for each application database
which are doing nearly the same thing: look into central database for
received messages for the application, insert them into application
database, look into the application database for messages to be sent,
insert them into the central databse... Also infinite loop...
When that error happend there was very low traffic, no concurrent or
nearly concurrent messages - in fact near that time, more precisly after
that error message (after daemons reconnection to the databses) there was
one message received...
Because of concurrent access to tables in the central database I had to
remove all foreign key constraints in that database - with foreign key
constraints there was very often deadlock detected... I know that this a
known issue...

I was running this setup but with only 5-6 applications for months without
problems even in big traffic... Exactly this setup was running for aprox.
10 days without problems.

Is it possible that this not postgres issue but hardware issue? Recently I
have problems with memory on this server but it has been changed. May be
something else is also damaged... This server is athlon with 2.4.16,
postgres 7.2.1 from debian package. There is nothing else running on this
server.

Is there anything else you would like to know?

thanks,        kuba



Re: FATAL: stuck spinlock

From
Jakub Ouhrabka
Date:
I've forgotten that there were another max. 100 backends connected to
application databases doing nothing (idle) - connections from apache
connection pool on another server...

Is there anything I can do for further investigation?

thanks,            kuba

> > > FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting.
> >
> > Oh?  That shouldn't happen.  More details please?
>
> I don't know what is interesting for you but I'll try...
> ...


Re: FATAL: stuck spinlock

From
Tom Lane
Date:
Jakub Ouhrabka <jouh8664@ss1000.ms.mff.cuni.cz> writes:
> Is there anything I can do for further investigation?

I'd suggest trying to verify or disprove your suspicions of bad RAM.
Run memory tests, warm the chips with a hairdryer and test again,
that sort of thing.

            regards, tom lane

Re: FATAL: stuck spinlock

From
Scott Marlowe
Date:
It's also possible you have some file corruption left over from running
with bad RAM.  I would backup the database, reinstall postgresql, then
restore the database as well.  No reason to chance it.

On Thu, 2 May 2002, Tom Lane wrote:

> Jakub Ouhrabka <jouh8664@ss1000.ms.mff.cuni.cz> writes:
> > Is there anything I can do for further investigation?
>
> I'd suggest trying to verify or disprove your suspicions of bad RAM.
> Run memory tests, warm the chips with a hairdryer and test again,
> that sort of thing.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>