Frequent "pg_ctl status" removing(?) semaphores (unlikely) - Mailing list pgsql-general
From | raf |
---|---|
Subject | Frequent "pg_ctl status" removing(?) semaphores (unlikely) |
Date | |
Msg-id | 20160927051350.GA13269@raf.org Whole thread Raw |
Responses |
Re: Frequent "pg_ctl status" removing(?) semaphores (unlikely)
|
List | pgsql-general |
Hi, debian-8 (stable), postgres-9.4.9 I've just started running "/etc/init.d/postgresql-9.4 status" every minute via cron and it seems to be having a very bad effect on the server ["So stop doing it!" heard from the peanut gallery]. I noticed an error message like: FATAL: the database system is in recovery mode when it shouldn't have been in recovery mode, and the log files say: [25844]: FATAL: semctl(5505030, 6, SETVAL, 0) failed: Invalid argument [6708]: LOG: server process (PID 25844) exited with exit code 1 [6708]: LOG: terminating any other active server processes [6714]: WARNING: terminating connection because of crash of another server process [6714]: DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server 2016-09-27 10:27:16 AEST::@:[6714]: HINT: In a moment you should be able to reconnect to the database and repeat your command. [6708]: LOG: all server processes terminated; reinitializing [6708]: LOG: could not remove shared memory segment "/PostgreSQL.1804289383": No such file or directory [6708]: LOG: semctl(5308416, 0, IPC_RMID, ...) failed: Invalid argument [6708]: LOG: semctl(5341185, 0, IPC_RMID, ...) failed: Invalid argument [6708]: LOG: semctl(5373954, 0, IPC_RMID, ...) failed: Invalid argument [6708]: LOG: semctl(5406723, 0, IPC_RMID, ...) failed: Invalid argument [6708]: LOG: semctl(5439492, 0, IPC_RMID, ...) failed: Invalid argument [6708]: LOG: semctl(5472261, 0, IPC_RMID, ...) failed: Invalid argument [6708]: LOG: semctl(5505030, 0, IPC_RMID, ...) failed: Invalid argument [6708]: LOG: semctl(5537799, 0, IPC_RMID, ...) failed: Invalid argument [25845]: LOG: database system was interrupted; last known up at 2016-09-23 09:29:28 AEST However, it's not really in recovery mode and it doesn't come good again until I manually restart the server. Googling for this issue revealed that something is deleting postgres's semaphores. Sure enough, the "ipcs -s" shows no semaphores when this happens rather than showing the usual eight lines of semaphore information. The advice was to look for another script issuing ipcrm commands (or, if running postgres in multiple freebsd jails, use a different userid for each). The only thing I'm doing differently since this started is running this every minute from a script run by cron: /etc/init.d/postgresql-9.4 status That means a call to: su -s /bin/sh - postgres -c "LD_LIBRARY_PATH=/opt/PostgreSQL/9.4/lib:$LD_LIBRARY_PATH /opt/PostgreSQL/9.4/bin/pgpg_ctl status -D \"/data/payroll-9.4\"" Now, I can't prove that "pg_ctl status" is causing the problem but when I disable the cronjob, the problem disappears and whenever I enable it the problem reappears fairly quickly (i.e. within an hour or two) so I'm fairly convinced that it's involved. And I've just noticed that the logfile messages above do refer to IPC_RMID so some part of postgres is trying to remove the semaphores but it looks like they're already gone when it tries. So, my qestion is, is it possible that "pg_ctl status" could be removing postgres's semaphores and can I stop it? It seems extremely unlikely. So, if it isn't, what else could it be? Systemd perhaps? It's been known to kill screen/tmux/nohup processes when a user logs out in its keenness to clean up but that may be clutching at straws. At first, when I saw this, I assumed that I had stopped the server interactively at the same time as the cronjob was starting it and the two actions clashed with regards to semaphore creation and removal but I wasn't convinced. And I'm not trying to stop the server now. I'm just running the cronjob to check the status. And the problem still occurs. In case you're wondering what else the cronjob does, the first thing it does is: /etc/init.d/postgresql-9.4 status | grep -q 'server is running' && exit 0 So it's not doing anything else if postgres is running. Any idea what I've done wrong? (apart from the obvious) :-) cheers, raf
pgsql-general by date: