Re: What to do when dynamic shared memory control segment is corrupt - Mailing list pgsql-general

From Tom Lane
Subject Re: What to do when dynamic shared memory control segment is corrupt
Date
Msg-id 28565.1529339413@sss.pgh.pa.us
Whole thread Raw
In response to What to do when dynamic shared memory control segment is corrupt  (Sherrylyn Branchaw <sbranchaw@gmail.com>)
Responses Re: What to do when dynamic shared memory control segment is corrupt  (Andres Freund <andres@anarazel.de>)
List pgsql-general
Sherrylyn Branchaw <sbranchaw@gmail.com> writes:
> We are using Postgres 9.6.8 (planning to upgrade to 9.6.9 soon) on RHEL 6.9.
> We recently experienced two similar outages on two different prod
> databases. The error messages from the logs were as follows:
> LOG:  server process (PID 138529) was terminated by signal 6: Aborted

Hm ... were these installations built with --enable-cassert?  If not,
an abort trap seems pretty odd.

> In one case, the logs recorded
> LOG:  all server processes terminated; reinitializing
> LOG:  incomplete data in "postmaster.pid": found only 1 newlines while
> trying to add line 7
> ...

> In the other case, the logs recorded
> LOG:  all server processes terminated; reinitializing
> LOG:  dynamic shared memory control segment is corrupt
> LOG:  incomplete data in "postmaster.pid": found only 1 newlines while
> trying to add line 7
> ...

Those "incomplete data" messages are quite unexpected and disturbing.
I don't know of any mechanism within Postgres proper that would result
in corruption of the postmaster.pid file that way.  (I wondered briefly
if trying to start a conflicting postmaster would result in such a
situation, but experimentation here says not.)  I'm suspicious that
this may indicate a bug or unwarranted assumption in whatever scripts
you use to start/stop the postmaster.  Whether that is at all related
to your crash issue is hard to say, but it bears looking into.

> My question is whether the corrupt shared memory control segment, and the
> failure of Postgres to automatically restart, mean the database should not
> be automatically started up, and if there's something we should be doing
> before restarting.

No, that looks like fairly typical crash recovery to me: corrupt shared
memory contents are expected and recovered from after a crash.  However,
we don't expect postmaster.pid to get mucked with.

            regards, tom lane


pgsql-general by date:

Previous
From: Łukasz Jarych
Date:
Subject: Run Stored procedure - function from VBA
Next
From: Andres Freund
Date:
Subject: Re: What to do when dynamic shared memory control segment is corrupt