Thread: Automatically starting postmaster after system crash
Sorry if this is a FAQ, but I couldn't find it. If my (RH 7.1) system crashes PostgreSQL does not restart automatically because the shared memory segment identifier and the .pid file remains, as a manual start explains: % pg_ctl start pg_ctl: Another postmaster may be running. Trying to start postmaster anyway. Found a pre-existing shared memory block (ID 693600256) still in use. If you're sure there are no old backends still running, remove the shared memory block with ipcrm(1), or just delete "/var/lib/pgsql/data/postmaster.pid". pg_ctl: cannot start postmaster Examine the log output. What is the "proper" way of ensuring (as far as possible) that PostgreSQL starts automatically after a crash? Is it sufficient (and safe) to include a 'rm -f $PGDATA/postmaster.pid' in the system boot scripts? Allan.
Allan Engelhardt <allane@cybaea.com> writes: > If my (RH 7.1) system crashes PostgreSQL does not restart automatically > because the shared memory segment identifier and the .pid file remains, That's kinda hard to believe; how would a shared memory segment survive a system crash? > % pg_ctl start > pg_ctl: Another postmaster may be running. Trying to start postmaster > anyway. > Found a pre-existing shared memory block (ID 693600256) still in use. Darn, I thought we had fixed that class of problems. Would you try tracing through SharedMemoryIsInUse() to figure out why it thinks that? It could be that there's some platform-specific variation of shmctl() behavior that we need to cater for. > What is the "proper" way of ensuring (as far as possible) that > PostgreSQL starts automatically after a crash? Is it sufficient (and > safe) to include a 'rm -f $PGDATA/postmaster.pid' in the system boot > scripts? You can do that if you want, but MHO is that this is a bug we need to fix. regards, tom lane
Tom Lane wrote: > Allan Engelhardt <allane@cybaea.com> writes: > >>If my (RH 7.1) system crashes PostgreSQL does not restart automatically >>because the shared memory segment identifier and the .pid file remains, >> > > That's kinda hard to believe; how would a shared memory segment survive > a system crash? I don't think they can. Some options: (1) PostgreSQL keeps a reference to it somewhere and can get confused... (2) Red Hat's script for starting PostgreSQL at boot time, which (a) ran, (b) failed, and [Arrrgh! I *must* fix that stupid script ;-P] (c) directs all pg_ctl output (out+err) to /dev/null, somehow fubared the system. > Darn, I thought we had fixed that class of problems. Would you try > tracing through SharedMemoryIsInUse() to figure out why it thinks that? > It could be that there's some platform-specific variation of shmctl() > behavior that we need to cater for. Uhm, my system doesn't crash *that* often... :-) Seriously: I tried to reproduce using SysRq+S, SysRq+B and couldn't. I think I have seen enough fsck for one night, so I might give it a rest... >>What is the "proper" way of ensuring (as far as possible) that >>PostgreSQL starts automatically after a crash? Is it sufficient (and >>safe) to include a 'rm -f $PGDATA/postmaster.pid' in the system boot >>scripts? >> > > You can do that if you want, but MHO is that this is a bug we need to > fix. I'll see what I can do about reproducing it... Allan
Allan Engelhardt <allane@cybaea.com> writes: > Tom Lane wrote: >> That's kinda hard to believe; how would a shared memory segment survive >> a system crash? > I don't think they can. Some options: > (1) PostgreSQL keeps a reference to it somewhere and can get confused... Indeed, there is a reference to the old segment in the postmaster.pid file. At startup, if there's a postmaster.pid file, Postgres checks to see that the indicated shared memory segment is gone or at least has no processes attached to it. (This is a defense against the possibility that the old postmaster died but there are still backends running in the database.) Evidently, that check is mistakenly thinking that there *is* still a shmem seg with attached processes. Question is why? > Seriously: I tried to reproduce using SysRq+S, SysRq+B and couldn't. I > think I have seen enough fsck for one night, so I might give it a rest... You might try just kill -9'ing the postmaster, rather than physically rebooting your system. regards, tom lane