Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed) - Mailing list pgsql-hackers
From | Patrick Verdon |
---|---|
Subject | Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed) |
Date | |
Msg-id | 36B1DC48.8C52FD92@kan.co.uk Whole thread Raw |
Responses |
Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)
Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed) |
List | pgsql-hackers |
Tatsuo, Vadim, Oleg, Scrappy, Many thanks for the response. A couple of you weren't convinced that this is a Postgres problem so let me try to clear the water a little bit. Maybe the use of Apache and mod_perl is confusing the issue - the point I was trying to make is that if there are 49+ concurrent postgres processes on a normal machine (i.e. where kernel parameters are the defaults, etc.) the postmaster dies in a nasty way with potentially damaging results. Here's a case without Apache/mod_perl that causes exactly the same behaviour. Simply enter the following 49 times: kandinsky:patrick> psql template1 & Note that I tried to automate this without success: perl -e 'for ( 1..49 ) { system("/usr/local/pgsql/bin/psql template1 &"); }' The 49th attempt to initiate a connection fails: Connection to database 'template1' failed. pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally beforeor while processing the request. and the error_log says: InitPostgres IpcSemaphoreCreate: semget failed (No space left on device) key=5432017, num=16, permission=600 proc_exit(3) [#0] shmem_exit(3) [#0] exit(3) /usr/local/pgsql/bin/postmaster: reaping dead processes... /usr/local/pgsql/bin/postmaster: CleanupProc: pid 1521 exited with status 768 /usr/local/pgsql/bin/postmaster: CleanupProc: sending SIGUSR1 to process 1518 NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally andpossibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your databasesystem connection and exit. Please reconnect to the database system and repeat your query. FATAL: s_lock(dfebe065) at spin.c:125, stuck spinlock. Aborting. FATAL: s_lock(dfebe065) at spin.c:125, stuck spinlock. Aborting. Even if there is a hard limit there is no way that Postgres should die in this spectacular fashion. I wouldn't have said that it was unreasonable for some large applications to peak at >48 processes when using powerful hardware with plenty of RAM. The other point is that even if one had 1 GB RAM, Postgres won't scale beyond 48 processes, using probably less than 100 MB of RAM. Would it be possible to make the 'MaxBackendId' configurable for those who have the resources? I have reproduced this behaviour on both FreeBSD 2.2.8 and Intel Solaris 2.6 using version 6.4.x of PostgreSQL. I'll try to change some of the parameters suggested and see how far I get but the bottom line is Postgres shouldn't be dying like this. Let me know if you need any more info. Cheers. Patrick -- #===============================# \ KAN Design & Publishing Ltd / / T: +44 (0)1223 511134 \ \ F: +44 (0)1223 571968 / / E: mailto:patrick@kan.co.uk \ \ W: http://www.kan.co.uk / #===============================#
pgsql-hackers by date: