Thread: Postmaster dies with many child processes (spinlock/semget failed)

Postmaster dies with many child processes (spinlock/semget failed)

From

Patrick Verdon

Date:

28 January 1999, 18:51:39

Hi, 

I sent the following message to the pgsql-general 
list on the 24th but haven't received any answers
from PostgreSQL developers, only from other people
who are experiencing the same problems.

I would say the errors I am describing are quite
serious and I was wondering whether there was any
chance of them being addressed in the forthcoming
6.5 release.

The problem is very easy to reproduce - here are
the necessary steps:

1. Install PostgreSQL 6.4.2
2. Install Perl 5.005_02
3. Install Perl modules: DBI 1.06; DBD-Pg 0.90; ApacheDBI-0.81
3. Download apache 1.3.4
4. Download mod_perl 1.17+ in same directory
5. Extract distributions
6. cd mod_perl-1.17
7. perl Makefile.PL EVERYTHING=1 && make && make test && make install
8. Set the following directives in Apache's httpd.conf:  MinSpareServers 100  MaxSpareServers 100  StartServers 100
MaxClients100
 
9. PerlRequire /usr/local/apache/conf/startup.pl where  startup.pl contains:  use Apache::Registry ();  use Apache::DBI
(); Apache::DBI->connect_on_init("DBI:Pg:dbname=template1", "", "");  1;
 
10. Start Apache: apachectl start

Note that this example makes use of no custom
application code and is using the template1 
database. 

Check Apache's error_log and you will see error 
messages and eventually the postmaster will die
with something like:
  FATAL: s_lock(28001065) at spin.c:125, stuck spinlock. Aborting.

The magic number seems to be 48. If I start 49 
httpd/postgres processes everything falls apart
but if I start 48 everything is fine. I'm 
running on FreeBSD 2.2.8 and I've increased
maxusers to 512 - no difference.

I'd appreciate some feedback from the guys who
are making PostgreSQL happen. Can these issues 
be addressed? PostgreSQL is a great database but 
this is a show stopper for people developing big 
Web applications. 

If you need any more information don't hesitate
to contact me.

Cheers.



Patrick

--

Sent to pgsql-general list on January 24th 1999:

Hi,

I've been doing some benchmarking with PostgreSQL
under mod_perl and I've been getting some rather
disturbing results. To achieve the maximum benefit
from persistent connections I am using a method
called 'connect_on_init' that comes with a Perl
module called Apache::DBI. Using this method,
when the Web server is first started - each child
process establishes a persistent connection with 
the database. When using PostgreSQL as the database,
this causes there to be as many 'postgres' 
processes are there are 'httpd' processes
for a given database.

As part of my benchmarking I've been testing the
number of httpd processes that my server can 
support. The machine is a 450 MHz PII/256 MB RAM.
As an excercise I tried to start 100 httpd
processes. Doing this consistently results in the
following PostgreSQL errors and the backend usually
dies:

IpcSemaphoreCreate: semget failed (No space left on device) key=5432017, num=16, permission=600
NOTICE:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend died abnormally
andpossibly corrupted shared memory.       I have rolled back the current transaction and am going to terminate your
databasesystem connection and exit.       Please reconnect to the database system and repeat your query.
 

FATAL: s_lock(28001065) at spin.c:125, stuck spinlock. Aborting.

Note that the 'no space left on device' is
misleading as there is a minimum of 400 MB 
available on each file-system on the server.

This is obviously bad news, especially as we are 
hoping to develop some fairly large-scale 
applications with PostgreSQL. Note that this
happens when connecting to a single database.
We were hoping to connect to several databases
from each httpd process!! 

The frustrating thing is we have the resources. 
If I only start 30 processes (which seems to be
the approximate limit) there is about 100 MB
of RAM that is not being used. 

Are there any configuration values that control 
the number of postgres processes? Do you have
any idea why this is happening? 

Is anyone else using Apache/mod_perl and PostgreSQL 
successfully in a demanding environment?

Any help would be greatly appreciated.

Cheers.



Patrick

-- 

#===============================#
\  KAN Design & Publishing Ltd  /
/  T: +44 (0)1223 511134        \
\  F: +44 (0)1223 571968        /
/  E: mailto:patrick@kan.co.uk  \ 
\  W: http://www.kan.co.uk      /
#===============================#

Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)

From

Tatsuo Ishii

Date:

28 January 1999, 20:10:38

>Hi, 
>
>I sent the following message to the pgsql-general 
>list on the 24th but haven't received any answers
>from PostgreSQL developers, only from other people
>who are experiencing the same problems.
>
>I would say the errors I am describing are quite
>serious and I was wondering whether there was any
>chance of them being addressed in the forthcoming
>6.5 release.

I don't think it's a PostgreSQL's problem.

[snip]

>Note that the 'no space left on device' is
>misleading as there is a minimum of 400 MB 
>available on each file-system on the server.

No. that message does not talking about the space left on your
disk. You need to increase the shared memory size.

You want to have 100 backends? 6.4.2 has the hard limit of number of
backends as 64. You can change this by editing following line:

#define MaxBackendId 64            /* maximum number of backends        */

in src/include/storage/sinvaladt.h. make sure do gmake clean before
recompiling.

Also you might ran out the file table entries. I recommend you to
limit the number of descriptors available to each backend. Probably 15
is enough. You can do this by issuing the csh builtin limit command
before starting postmaster.
--
Tatsuo Ishii

Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)

From

The Hermit Hacker

Date:

28 January 1999, 20:32:13

On Thu, 28 Jan 1999, Patrick Verdon wrote:

> IpcSemaphoreCreate: semget failed (No space left on device) key=5432017, num=16, permission=600
> NOTICE:  Message from PostgreSQL backend:
>         The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory.
>         I have rolled back the current transaction and am going to terminate your database system connection and
exit.
>         Please reconnect to the database system and repeat your query.
> 
> FATAL: s_lock(28001065) at spin.c:125, stuck spinlock. Aborting.
> 
> Note that the 'no space left on device' is
> misleading as there is a minimum of 400 MB 
> available on each file-system on the server.

My first guess is that you don't have enough semaphores enabled in your
kernel...increase that from the default, and I'm *guessing* that you'll
get past your 48...

Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org

Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)

From

Vadim Mikheev

Date:

28 January 1999, 22:06:41

Patrick Verdon wrote:
> 
> Check Apache's error_log and you will see error
> messages and eventually the postmaster will die
> with something like:
> 
>    FATAL: s_lock(28001065) at spin.c:125, stuck spinlock. Aborting.

Try to increase S_MAX_BUSY in src/backend/storage/buffer/s_lock.c:

#define S_MAX_BUSY      500 * S_NSPINCYCLE                       ^^^
try with 10000.

Vadim

Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)

From

Oleg Bartunov

Date:

29 January 1999, 01:41:44

I don't think this is a Postgres problem. I got the same
problem you described when upgrading Apache from 1.3.3 to 1.3.4
I had to return to 1.3.3
Probably I will try modperl 1.18+Apache 1.3.4
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83