Frequent "pg_ctl status" removing(?) semaphores (unlikely) - Mailing list pgsql-general

From raf
Subject Frequent "pg_ctl status" removing(?) semaphores (unlikely)
Date
Msg-id 20160927051350.GA13269@raf.org
Whole thread Raw
Responses Re: Frequent "pg_ctl status" removing(?) semaphores (unlikely)
List pgsql-general
Hi,

debian-8 (stable), postgres-9.4.9

I've just started running "/etc/init.d/postgresql-9.4 status"
every minute via cron and it seems to be having a very bad effect
on the server ["So stop doing it!" heard from the peanut gallery].

I noticed an error message like:

  FATAL:  the database system is in recovery mode

when it shouldn't have been in recovery mode,

and the log files say:

  [25844]: FATAL:  semctl(5505030, 6, SETVAL, 0) failed: Invalid argument
  [6708]: LOG:  server process (PID 25844) exited with exit code 1
  [6708]: LOG:  terminating any other active server processes
  [6714]: WARNING:  terminating connection because of crash of another server process
  [6714]: DETAIL:  The postmaster has commanded this server process to roll back the
     current transaction and exit, because another server 2016-09-27 10:27:16 AEST::@:[6714]:
     HINT:  In a moment you should be able to reconnect to the database and repeat your command.
  [6708]: LOG:  all server processes terminated; reinitializing
  [6708]: LOG:  could not remove shared memory segment "/PostgreSQL.1804289383": No such file or directory
  [6708]: LOG:  semctl(5308416, 0, IPC_RMID, ...) failed: Invalid argument
  [6708]: LOG:  semctl(5341185, 0, IPC_RMID, ...) failed: Invalid argument
  [6708]: LOG:  semctl(5373954, 0, IPC_RMID, ...) failed: Invalid argument
  [6708]: LOG:  semctl(5406723, 0, IPC_RMID, ...) failed: Invalid argument
  [6708]: LOG:  semctl(5439492, 0, IPC_RMID, ...) failed: Invalid argument
  [6708]: LOG:  semctl(5472261, 0, IPC_RMID, ...) failed: Invalid argument
  [6708]: LOG:  semctl(5505030, 0, IPC_RMID, ...) failed: Invalid argument
  [6708]: LOG:  semctl(5537799, 0, IPC_RMID, ...) failed: Invalid argument
  [25845]: LOG:  database system was interrupted; last known up at 2016-09-23 09:29:28 AEST

However, it's not really in recovery mode and it doesn't come
good again until I manually restart the server.

Googling for this issue revealed that something is deleting
postgres's semaphores. Sure enough, the "ipcs -s" shows no
semaphores when this happens rather than showing the usual eight
lines of semaphore information.

The advice was to look for another script issuing ipcrm commands
(or, if running postgres in multiple freebsd jails, use a
different userid for each).

The only thing I'm doing differently since this started is
running this every minute from a script run by cron:

  /etc/init.d/postgresql-9.4 status

That means a call to:

  su -s /bin/sh - postgres -c "LD_LIBRARY_PATH=/opt/PostgreSQL/9.4/lib:$LD_LIBRARY_PATH
      /opt/PostgreSQL/9.4/bin/pgpg_ctl status -D \"/data/payroll-9.4\""

Now, I can't prove that "pg_ctl status" is causing the problem
but when I disable the cronjob, the problem disappears and
whenever I enable it the problem reappears fairly quickly (i.e.
within an hour or two) so I'm fairly convinced that it's
involved. And I've just noticed that the logfile messages
above do refer to IPC_RMID so some part of postgres
is trying to remove the semaphores but it looks like they're
already gone when it tries.

So, my qestion is, is it possible that "pg_ctl status" could be
removing postgres's semaphores and can I stop it? It seems
extremely unlikely. So, if it isn't, what else could it be?
Systemd perhaps? It's been known to kill screen/tmux/nohup
processes when a user logs out in its keenness to clean up but
that may be clutching at straws.

At first, when I saw this, I assumed that I had stopped the
server interactively at the same time as the cronjob was
starting it and the two actions clashed with regards to
semaphore creation and removal but I wasn't convinced. And I'm
not trying to stop the server now. I'm just running the cronjob
to check the status. And the problem still occurs.

In case you're wondering what else the cronjob does, the first
thing it does is:

  /etc/init.d/postgresql-9.4 status | grep -q 'server is running' && exit 0

So it's not doing anything else if postgres is running.

Any idea what I've done wrong? (apart from the obvious) :-)

cheers,
raf



pgsql-general by date:

Previous
From: Patrick B
Date:
Subject: Re: Update two tables returning id from insert CTE Query
Next
From: hariprasath nallasamy
Date:
Subject: Re: Incrementally refreshed materialized view