Thread: Problem with frequent crashes related to semctl

Problem with frequent crashes related to semctl

From
Adrian Maier
Date:
Hello !

I am running PostgreSQL 8.3.5 on a linux machine (Ubuntu 10.04).
Sometimes it happens that connecting to the database fails with error :

     FATAL:  semctl(360458, 3, SETVAL, 0) failed: Invalid argument (PGError)

If i restart postgres the problem gets "fixed".  It doesn't matter how do i
connect to the database :  i saw this happening from psql, from jdbc, and
from ruby.


The pgsql configuration is the default one : i have changed only
listen_addresses and the port.


However, the machine is configured with some pretty large values for POSIX queues:

   fs.mqueue.msgsize_max=2621440
   fs.mqueue.msg_max=10240
   fs.mqueue.queues_max=10240

Also, the user is unlimited in regards to queues in /etc/security/limits/conf :
   am              hard    msgqueue        unlimited

These are needed for another application running on the same machine (which
performs some heavy communication via POSIX queues).  I am not sure whether
this can interfere with the semaphores used by postgres ...


Does the situation described above ring any bell for anyone? Any suggestion
about how to analyse deeper the problem ?


I am also aware that the error happened also on another machine (Fedora linux)
that has the same mqueue settings.


Best regards,
Adrian Maier


PS:   Here is an example log file :

LOG:  database system was shut down at 2010-11-04 16:50:35 EET
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
FATAL:  semctl(360458, 6, SETVAL, 0) failed: Invalid argument
FATAL:  semctl(360458, 3, SETVAL, 0) failed: Invalid argument
FATAL:  semctl(360458, 2, SETVAL, 0) failed: Invalid argument
LOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down
LOG:  semctl(327689, 0, IPC_RMID, ...) failed: Invalid argument
LOG:  semctl(360458, 0, IPC_RMID, ...) failed: Invalid argument




Re: Problem with frequent crashes related to semctl

From
Tom Lane
Date:
Adrian Maier <adrian.maier@thalesgroup.com> writes:
> I am running PostgreSQL 8.3.5 on a linux machine (Ubuntu 10.04).
> Sometimes it happens that connecting to the database fails with error :

>      FATAL:  semctl(360458, 3, SETVAL, 0) failed: Invalid argument (PGError)

> If i restart postgres the problem gets "fixed".  It doesn't matter how do i
> connect to the database :  i saw this happening from psql, from jdbc, and
> from ruby.

The most likely theory is that something deleted Postgres' semaphores
out from under it.  You could check this by noting the output of "ipcs -s"
while the database is running normally, and then comparing to the output
after it starts to fail.

If that does seem to be what's happening, look around for root-executed
scripts doing "ipcrm" calls.

            regards, tom lane

Re: Problem with frequent crashes related to semctl

From
Adrian Maier
Date:
On 11/05/2010 05:02 PM, Tom Lane wrote:
> Adrian Maier<adrian.maier@thalesgroup.com>  writes:
>> I am running PostgreSQL 8.3.5 on a linux machine (Ubuntu 10.04).
>> Sometimes it happens that connecting to the database fails with error :
>
>>       FATAL:  semctl(360458, 3, SETVAL, 0) failed: Invalid argument (PGError)
>
>> If i restart postgres the problem gets "fixed".  It doesn't matter how do i
>> connect to the database :  i saw this happening from psql, from jdbc, and
>> from ruby.
>
> The most likely theory is that something deleted Postgres' semaphores
> out from under it.  You could check this by noting the output of "ipcs -s"
> while the database is running normally, and then comparing to the output
> after it starts to fail.
>
> If that does seem to be what's happening, look around for root-executed
> scripts doing "ipcrm" calls.

Tom,
Thanks for the tip.

The semaphores are indeed deleted with ipcrm from a script. The script is
(re)starting another application and it simply erases all the semaphores
without taking into account the possibility that some semaphores are actually
belonging to another process...

I'll simply move the postgres installation to be started by another user. Doing
this should protect the postgres semaphores against the script (which is
executed as a regular user, not root).



Thanks,
Adrian Maier