Thread: Automatically starting postmaster after system crash

Automatically starting postmaster after system crash

From
Allan Engelhardt
Date:
Sorry if this is a FAQ, but I couldn't find it.

If my (RH 7.1) system crashes PostgreSQL does not restart automatically
because the shared memory segment identifier and the .pid file remains,
as a manual start explains:

  % pg_ctl start
  pg_ctl: Another postmaster may be running.  Trying to start postmaster
anyway.
  Found a pre-existing shared memory block (ID 693600256) still in use.
  If you're sure there are no old backends still running,
  remove the shared memory block with ipcrm(1), or just
  delete "/var/lib/pgsql/data/postmaster.pid".
  pg_ctl: cannot start postmaster
  Examine the log output.

What is the "proper" way of ensuring (as far as possible) that
PostgreSQL starts automatically after a crash?  Is it sufficient (and
safe) to include a 'rm -f $PGDATA/postmaster.pid' in the system boot
scripts?

Allan.


Re: Automatically starting postmaster after system crash

From
Tom Lane
Date:
Allan Engelhardt <allane@cybaea.com> writes:
> If my (RH 7.1) system crashes PostgreSQL does not restart automatically
> because the shared memory segment identifier and the .pid file remains,

That's kinda hard to believe; how would a shared memory segment survive
a system crash?

>   % pg_ctl start
>   pg_ctl: Another postmaster may be running.  Trying to start postmaster
> anyway.
>   Found a pre-existing shared memory block (ID 693600256) still in use.

Darn, I thought we had fixed that class of problems.  Would you try
tracing through SharedMemoryIsInUse() to figure out why it thinks that?
It could be that there's some platform-specific variation of shmctl()
behavior that we need to cater for.

> What is the "proper" way of ensuring (as far as possible) that
> PostgreSQL starts automatically after a crash?  Is it sufficient (and
> safe) to include a 'rm -f $PGDATA/postmaster.pid' in the system boot
> scripts?

You can do that if you want, but MHO is that this is a bug we need to
fix.

            regards, tom lane

Re: Automatically starting postmaster after system crash

From
Allan Engelhardt
Date:
Tom Lane wrote:

> Allan Engelhardt <allane@cybaea.com> writes:
>
>>If my (RH 7.1) system crashes PostgreSQL does not restart automatically
>>because the shared memory segment identifier and the .pid file remains,
>>
>
> That's kinda hard to believe; how would a shared memory segment survive
> a system crash?


I don't think they can.  Some options:

(1) PostgreSQL keeps a reference to it somewhere and can get confused...

(2) Red Hat's script for starting PostgreSQL at boot time, which (a)
ran, (b) failed, and [Arrrgh!  I *must* fix that stupid script ;-P] (c)
directs all pg_ctl output (out+err) to /dev/null, somehow fubared the
system.


> Darn, I thought we had fixed that class of problems.  Would you try
> tracing through SharedMemoryIsInUse() to figure out why it thinks that?
> It could be that there's some platform-specific variation of shmctl()
> behavior that we need to cater for.


Uhm, my system doesn't crash *that* often... :-)

Seriously: I tried to reproduce using SysRq+S, SysRq+B and couldn't.  I
think I have seen enough fsck for one night, so I might give it a rest...

>>What is the "proper" way of ensuring (as far as possible) that
>>PostgreSQL starts automatically after a crash?  Is it sufficient (and
>>safe) to include a 'rm -f $PGDATA/postmaster.pid' in the system boot
>>scripts?
>>
>
> You can do that if you want, but MHO is that this is a bug we need to
> fix.


I'll see what I can do about reproducing it...

Allan


Re: Automatically starting postmaster after system crash

From
Tom Lane
Date:
Allan Engelhardt <allane@cybaea.com> writes:
> Tom Lane wrote:
>> That's kinda hard to believe; how would a shared memory segment survive
>> a system crash?

> I don't think they can.  Some options:

> (1) PostgreSQL keeps a reference to it somewhere and can get confused...

Indeed, there is a reference to the old segment in the postmaster.pid
file.  At startup, if there's a postmaster.pid file, Postgres checks to
see that the indicated shared memory segment is gone or at least has no
processes attached to it.  (This is a defense against the possibility
that the old postmaster died but there are still backends running in
the database.)  Evidently, that check is mistakenly thinking that there
*is* still a shmem seg with attached processes.  Question is why?

> Seriously: I tried to reproduce using SysRq+S, SysRq+B and couldn't.  I
> think I have seen enough fsck for one night, so I might give it a rest...

You might try just kill -9'ing the postmaster, rather than physically
rebooting your system.

            regards, tom lane