Thread: FATAL: stuck spinlock
hi, i've found these messages in my server log: FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting. FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting. FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting. FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting. after these messages all backends were restarted (closed connections to clients). now everything seems to be ok. what does it mean? thanks, kuba
Jakub Ouhrabka <jouh8664@ss1000.ms.mff.cuni.cz> writes: > i've found these messages in my server log: > FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting. Oh? That shouldn't happen. More details please? regards, tom lane
> > FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting. > > Oh? That shouldn't happen. More details please? I don't know what is interesting for you but I'll try... There is one central database and 10 other databases, let's call them applications' databases. There are 4 daemons which are receiving messages from outside, inserting them into the central database then they look (select) if there are messages for them to send, if so, then send it and update the sent row. All 4 daemons are are working with two same tables: one for received messages and one for messages to be sent, there is one fileds in both tables which determininies for/from which daemon is that row... These daemons are working in infinite loops: try to receive max 100 messages from outside and then try to send max 100, so there are no notifications, triggers, etc... And then, there are 10 other daemons, one for each application database which are doing nearly the same thing: look into central database for received messages for the application, insert them into application database, look into the application database for messages to be sent, insert them into the central databse... Also infinite loop... When that error happend there was very low traffic, no concurrent or nearly concurrent messages - in fact near that time, more precisly after that error message (after daemons reconnection to the databses) there was one message received... Because of concurrent access to tables in the central database I had to remove all foreign key constraints in that database - with foreign key constraints there was very often deadlock detected... I know that this a known issue... I was running this setup but with only 5-6 applications for months without problems even in big traffic... Exactly this setup was running for aprox. 10 days without problems. Is it possible that this not postgres issue but hardware issue? Recently I have problems with memory on this server but it has been changed. May be something else is also damaged... This server is athlon with 2.4.16, postgres 7.2.1 from debian package. There is nothing else running on this server. Is there anything else you would like to know? thanks, kuba
I've forgotten that there were another max. 100 backends connected to application databases doing nothing (idle) - connections from apache connection pool on another server... Is there anything I can do for further investigation? thanks, kuba > > > FATAL: s_lock(0x40361030) at lwlock.c:236, stuck spinlock. Aborting. > > > > Oh? That shouldn't happen. More details please? > > I don't know what is interesting for you but I'll try... > ...
Jakub Ouhrabka <jouh8664@ss1000.ms.mff.cuni.cz> writes: > Is there anything I can do for further investigation? I'd suggest trying to verify or disprove your suspicions of bad RAM. Run memory tests, warm the chips with a hairdryer and test again, that sort of thing. regards, tom lane
It's also possible you have some file corruption left over from running with bad RAM. I would backup the database, reinstall postgresql, then restore the database as well. No reason to chance it. On Thu, 2 May 2002, Tom Lane wrote: > Jakub Ouhrabka <jouh8664@ss1000.ms.mff.cuni.cz> writes: > > Is there anything I can do for further investigation? > > I'd suggest trying to verify or disprove your suspicions of bad RAM. > Run memory tests, warm the chips with a hairdryer and test again, > that sort of thing. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >