Thread: Postmaster dies with many child processes (spinlock/semget failed)
Hi, I sent the following message to the pgsql-general list on the 24th but haven't received any answers from PostgreSQL developers, only from other people who are experiencing the same problems. I would say the errors I am describing are quite serious and I was wondering whether there was any chance of them being addressed in the forthcoming 6.5 release. The problem is very easy to reproduce - here are the necessary steps: 1. Install PostgreSQL 6.4.2 2. Install Perl 5.005_02 3. Install Perl modules: DBI 1.06; DBD-Pg 0.90; ApacheDBI-0.81 3. Download apache 1.3.4 4. Download mod_perl 1.17+ in same directory 5. Extract distributions 6. cd mod_perl-1.17 7. perl Makefile.PL EVERYTHING=1 && make && make test && make install 8. Set the following directives in Apache's httpd.conf: MinSpareServers 100 MaxSpareServers 100 StartServers 100 MaxClients100 9. PerlRequire /usr/local/apache/conf/startup.pl where startup.pl contains: use Apache::Registry (); use Apache::DBI (); Apache::DBI->connect_on_init("DBI:Pg:dbname=template1", "", ""); 1; 10. Start Apache: apachectl start Note that this example makes use of no custom application code and is using the template1 database. Check Apache's error_log and you will see error messages and eventually the postmaster will die with something like: FATAL: s_lock(28001065) at spin.c:125, stuck spinlock. Aborting. The magic number seems to be 48. If I start 49 httpd/postgres processes everything falls apart but if I start 48 everything is fine. I'm running on FreeBSD 2.2.8 and I've increased maxusers to 512 - no difference. I'd appreciate some feedback from the guys who are making PostgreSQL happen. Can these issues be addressed? PostgreSQL is a great database but this is a show stopper for people developing big Web applications. If you need any more information don't hesitate to contact me. Cheers. Patrick -- Sent to pgsql-general list on January 24th 1999: Hi, I've been doing some benchmarking with PostgreSQL under mod_perl and I've been getting some rather disturbing results. To achieve the maximum benefit from persistent connections I am using a method called 'connect_on_init' that comes with a Perl module called Apache::DBI. Using this method, when the Web server is first started - each child process establishes a persistent connection with the database. When using PostgreSQL as the database, this causes there to be as many 'postgres' processes are there are 'httpd' processes for a given database. As part of my benchmarking I've been testing the number of httpd processes that my server can support. The machine is a 450 MHz PII/256 MB RAM. As an excercise I tried to start 100 httpd processes. Doing this consistently results in the following PostgreSQL errors and the backend usually dies: IpcSemaphoreCreate: semget failed (No space left on device) key=5432017, num=16, permission=600 NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally andpossibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your databasesystem connection and exit. Please reconnect to the database system and repeat your query. FATAL: s_lock(28001065) at spin.c:125, stuck spinlock. Aborting. Note that the 'no space left on device' is misleading as there is a minimum of 400 MB available on each file-system on the server. This is obviously bad news, especially as we are hoping to develop some fairly large-scale applications with PostgreSQL. Note that this happens when connecting to a single database. We were hoping to connect to several databases from each httpd process!! The frustrating thing is we have the resources. If I only start 30 processes (which seems to be the approximate limit) there is about 100 MB of RAM that is not being used. Are there any configuration values that control the number of postgres processes? Do you have any idea why this is happening? Is anyone else using Apache/mod_perl and PostgreSQL successfully in a demanding environment? Any help would be greatly appreciated. Cheers. Patrick -- #===============================# \ KAN Design & Publishing Ltd / / T: +44 (0)1223 511134 \ \ F: +44 (0)1223 571968 / / E: mailto:patrick@kan.co.uk \ \ W: http://www.kan.co.uk / #===============================#
Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)
From
Tatsuo Ishii
Date:
>Hi, > >I sent the following message to the pgsql-general >list on the 24th but haven't received any answers >from PostgreSQL developers, only from other people >who are experiencing the same problems. > >I would say the errors I am describing are quite >serious and I was wondering whether there was any >chance of them being addressed in the forthcoming >6.5 release. I don't think it's a PostgreSQL's problem. [snip] >Note that the 'no space left on device' is >misleading as there is a minimum of 400 MB >available on each file-system on the server. No. that message does not talking about the space left on your disk. You need to increase the shared memory size. You want to have 100 backends? 6.4.2 has the hard limit of number of backends as 64. You can change this by editing following line: #define MaxBackendId 64 /* maximum number of backends */ in src/include/storage/sinvaladt.h. make sure do gmake clean before recompiling. Also you might ran out the file table entries. I recommend you to limit the number of descriptors available to each backend. Probably 15 is enough. You can do this by issuing the csh builtin limit command before starting postmaster. -- Tatsuo Ishii
Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)
From
The Hermit Hacker
Date:
On Thu, 28 Jan 1999, Patrick Verdon wrote: > IpcSemaphoreCreate: semget failed (No space left on device) key=5432017, num=16, permission=600 > NOTICE: Message from PostgreSQL backend: > The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. > I have rolled back the current transaction and am going to terminate your database system connection and exit. > Please reconnect to the database system and repeat your query. > > FATAL: s_lock(28001065) at spin.c:125, stuck spinlock. Aborting. > > Note that the 'no space left on device' is > misleading as there is a minimum of 400 MB > available on each file-system on the server. My first guess is that you don't have enough semaphores enabled in your kernel...increase that from the default, and I'm *guessing* that you'll get past your 48... Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)
From
Vadim Mikheev
Date:
Patrick Verdon wrote: > > Check Apache's error_log and you will see error > messages and eventually the postmaster will die > with something like: > > FATAL: s_lock(28001065) at spin.c:125, stuck spinlock. Aborting. Try to increase S_MAX_BUSY in src/backend/storage/buffer/s_lock.c: #define S_MAX_BUSY 500 * S_NSPINCYCLE ^^^ try with 10000. Vadim
Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)
From
Oleg Bartunov
Date:
I don't think this is a Postgres problem. I got the same problem you described when upgrading Apache from 1.3.3 to 1.3.4 I had to return to 1.3.3 Probably I will try modperl 1.18+Apache 1.3.4 Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83