Backends waiting, spinlocks, shared mem patches - Mailing list pgsql-hackers
From | Wayne Piekarski |
---|---|
Subject | Backends waiting, spinlocks, shared mem patches |
Date | |
Msg-id | 199905301406.XAA20023@helpdesk.senet.com.au Whole thread Raw |
Responses |
Re: [HACKERS] Backends waiting, spinlocks, shared mem patches
|
List | pgsql-hackers |
Hi everyone, Sorry this has taken me so long to get back to you. Just to refresh everyones memory, I was the one who was having problems with postgres' backends just hanging around in waiting, not doing anything. Tom Lane sent me a patch to fix this for 6.4.2. We didn't just install the patch on our live system and run it, as we were worried about breaking something, so we spent a lot of time thrashing it around, trying to reproduce the problem to check if it had been fixed (this is why its taken me a while to do this). We captured a few hundred sessions that our CGI's have with the database, including begin..commit pairs and everything, in order to accurately simulate a heavy load on the dbms. We tried this program, keeping about 40-50 connections going the whole time, and we could not get the waiting problem to occur with even the normal 6.4.2 so it was not possible to test if the patch had fixed our particular problem. So that was disappointing, we figured because we were hammering it so hard that it would fail quickly and could use this as a good test program. The 6.4.2 patched version ran fine as well, so this was good. It seems that the problem was caused by very rare circumstances which we just couldn't reproduce during testing. One thing we did notice is that when we tried to open more than say 50 backends, we would get the following: InitPostgres IpcSemaphoreCreate: semget failed (No space left on device) key=5432017, num=16, permission=600 proc_exit(3) [#0] Shortly after, we would get: FATAL: s_lock(18001065) at spin.c:125, stuck spinlock. Aborting. Our FreeBSD machine was not setup for a huge number of semaphores, so the semget was failing. That was fair enough, but then postmaster would die afterwards with the spinlock error. I saw a post by Hiroshi Inoue with the following: >Hi all, > >ProcReleaseSpins() does nothing unless MyProc is set. >So both elog(ERROR/FATAL) and proc_exit(0) before >InitProcess() don't release spinlocks. > >Comments ? > >Hiroshi Inoue >Inoue@tpf.co.jp I would have to agree with him here, i'm not familiar with postgres internals but it looks like when semget fails, the backend doesn't clean up the resources it already owns. I'm not sure if this is fixed, as I can't always read the hackers list, but I thought I'd mention this in case someone found it interesting. We tried the same massive number of connections test with 6.5 and it refuses to accept the connection after a while, which is good. I'm reading through archives about MaxBackendId now, so I'm going to play with that. So anyways, we installed the 6.4.2 patch a few days ago, and it seems to be running ok. I haven't seen any cases where we get processes waiting for nothing, (yet anyway - i'll have to wait and see for a few days). However, now we are getting the stuck spinlock errors due to too many backends being open, which I'm trying to prevent now so hopefully these two problems will both go away now. Now that I've learned more about the stuck spinlock problem, I realise that when I emailed the first time, it was not just one problem, but two or three at the same time which were making it harder to nail down what the problem was. We will watch it over the week. We have also been doing some testing with the latest 6.5 from the other day, to check that certain problems we've bumped into have been fixed. We can't run it live, but we'll try to run our testing programs on it as a best approximation to help flush out any bugs that might be left. Thanks for your help everyone, I hope that this has been helpful for everyone else as well. I'm really looking forward to 6.5 :) bye, Wayne ------------------------------------------------------------------------------ Wayne Piekarski Tel: (08) 8221 5221 Research & Development Manager Fax: (08) 8221 5220 SE Network Access Pty Ltd Mob: 0407 395 889 222 Grote Street Email: wayne@senet.com.au Adelaide SA 5000 WWW: http://www.senet.com.au
pgsql-hackers by date: