Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed) - Mailing list pgsql-hackers

From Patrick Verdon
Subject Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)
Date
Msg-id 36B1DC48.8C52FD92@kan.co.uk
Whole thread Raw
Responses Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)
Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)
List pgsql-hackers
Tatsuo, Vadim, Oleg, Scrappy,

Many thanks for the response.

A couple of you weren't convinced that this
is a Postgres problem so let me try to clear
the water a little bit. Maybe the use of 
Apache and mod_perl is confusing the issue -
the point I was trying to make is that if 
there are 49+ concurrent postgres processes
on a normal machine (i.e. where kernel 
parameters are the defaults, etc.) the 
postmaster dies in a nasty way with 
potentially damaging results. 

Here's a case without Apache/mod_perl that
causes exactly the same behaviour. Simply
enter the following 49 times:

kandinsky:patrick> psql template1 &

Note that I tried to automate this without
success: 

perl -e 'for ( 1..49 ) { system("/usr/local/pgsql/bin/psql template1 &"); }'

The 49th attempt to initiate a connection 
fails:

Connection to database 'template1' failed.
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
beforeor while processing the request.
 

and the error_log says:

InitPostgres
IpcSemaphoreCreate: semget failed (No space left on device) key=5432017, num=16, permission=600
proc_exit(3) [#0]
shmem_exit(3) [#0]
exit(3)
/usr/local/pgsql/bin/postmaster: reaping dead processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 1521 exited with status 768
/usr/local/pgsql/bin/postmaster: CleanupProc: sending SIGUSR1 to process 1518
NOTICE:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend died abnormally
andpossibly corrupted shared memory.       I have rolled back the current transaction and am going to terminate your
databasesystem connection and exit.       Please reconnect to the database system and repeat your query.
 

FATAL: s_lock(dfebe065) at spin.c:125, stuck spinlock. Aborting.

FATAL: s_lock(dfebe065) at spin.c:125, stuck spinlock. Aborting.


Even if there is a hard limit there is no way that 
Postgres should die in this spectacular fashion.
I wouldn't have said that it was unreasonable for
some large applications to peak at >48 processes
when using powerful hardware with plenty of RAM.

The other point is that even if one had 1 GB RAM,
Postgres won't scale beyond 48 processes, using
probably less than 100 MB of RAM. Would it be
possible to make the 'MaxBackendId' configurable
for those who have the resources?

I have reproduced this behaviour on both 
FreeBSD 2.2.8 and Intel Solaris 2.6 using
version 6.4.x of PostgreSQL.

I'll try to change some of the parameters
suggested and see how far I get but the bottom 
line is Postgres shouldn't be dying like this.

Let me know if you need any more info.

Cheers.



Patrick

-- 

#===============================#
\  KAN Design & Publishing Ltd  /
/  T: +44 (0)1223 511134        \
\  F: +44 (0)1223 571968        /
/  E: mailto:patrick@kan.co.uk  \ 
\  W: http://www.kan.co.uk      /
#===============================#


pgsql-hackers by date:

Previous
From: jwieck@debis.com (Jan Wieck)
Date:
Subject: Re: [HACKERS] Postgres Speed or lack thereof
Next
From: Oleg Broytmann
Date:
Subject: VACUUM ANALYZE failed on linux