On 19 October 2015 21:37, Robert Haas [mailto:robertmhaas@gmail.com] Wrote:
>On Sat, Oct 17, 2015 at 4:52 PM, Alvaro Herrera
><alvherre@2ndquadrant.com> wrote:
>> Andres Freund wrote:
>>> On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote:
>>> > If I recall correctly, he concerned about killing the backends
>>> > running transactions which could be saved. I have a sympathy with
>>> > the opinion.
>>>
>>> I still don't. Leaving backends alive after postmaster has died
>>> prevents the auto-restart mechanism to from working from there on.
>>> Which means that we'll potentially continue happily after another
>>> backend has PANICed and potentially corrupted shared memory. Which
>>> isn't all that unlikely if postmaster isn't around anymore.
>>
>> I agree. When postmaster terminates without waiting for all backends
>> to go away, things are going horribly wrong -- either a DBA has done
>> something stupid, or the system is misbehaving. As Andres says, if
>> another backend dies at that point, things are even worse -- the dying
>> backend could have been holding a critical lwlock, for instance, or it
>> could have corrupted shared buffers on its way out. It is not
>> sensible to leave the rest of the backends in the system still trying
>> to run just because there is no one there to kill them.
>
>Yep. +1 for changing this.
Seems many people are in favor of this change.
I have made changes to handle backend exit on postmaster death (after they finished their work and waiting for new
command).
Changes are as per approach explained in my earlier mail i.e.
1. WaitLatchOrSocket called from secure_read and secure_write function will wait on an additional event as
WL_POSTMASTER_DEATH.
2. There is a possibility that the command is read without waiting on latch. This case is handled by checking
postmasterstatus after command read (i.e. after ReadCommand).
Attached is the patch.
Thanks and Regards,
Kumar Rajeev Rastogi