From: "Alvaro Herrera" <alvherre@2ndquadrant.com>
> I will go with 5 seconds, then.
OK, I agree.
> My point is that there is no difference. For one thing, once we enter
> immediate shutdown state, and sigkill has been sent, no further action
> is taken. Postmaster will just sit there indefinitely until processes
> are gone. If we were to make it repeat SIGKILL until they die, that
> would be different. However, repeating SIGKILL is pointless, because it
> they didn't die when they first received it, they will still not die
> when they receive it second. Also, if they're in uninterruptible sleep
> and don't die, then they will die as soon as they get out of that state;
> no further queries will get processed, no further memory access will be
> done. So there's no harm in they remaining there until underlying
> storage returns to life, ISTM.
>
>> Here, "reliable" means that the database server is certainly shut
>> down when pg_ctl returns, not telling a lie that "I shut down the
>> server processes for you, so you do not have to be worried that some
>> postgres process might still remain and write to disk". I suppose
>> reliable shutdown is crucial especially in HA cluster. If pg_ctl
>> stop -mi gets stuck forever when there is an unkillable process (in
>> what situations does this happen? OS bug, or NFS hard mount?), I
>> think the DBA has to notice this situation from the unfinished
>> pg_ctl, investigate the cause, and take corrective action.
>
> So you're suggesting that keeping postmaster up is a useful sign that
> the shutdown is not going well? I'm not really sure about this. What
> do others think?
I think you are right, and there is no harm in leaving postgres processes in
unkillable state. I'd like to leave the decision to you and/or others.
One concern is that umount would fail in such a situation because postgres
has some open files on the filesystem, which is on the shared disk in case
of traditional HA cluster. However, STONITH should resolve the problem by
terminating the stuck node... I just feel it is strange for umount to fail
due to remaining postgres, because pg_ctl stop -mi reported success.
> IIRC the only other interesting tweak I did was rename the
> SignalAllChildren() function to TerminateChildren(). I did this because
> it doesn't really signal all children; syslogger and dead_end backends
> are kept around. So the original name was a bit misleading. And we
> couldn't really name it SignalAlmostAllChildren(), could we ..
I see. thank you.
Regards
MauMau