Thread: Dangling Client Backend Process

Dangling Client Backend Process

From
Rajeev rastogi
Date:
<div class="WordSection1"><p class="MsoNormal"><span style="font-size:13.0pt">I observed one strange behavior today
thatif postmaster process gets crashed/killed, then it kill all background processes but not the client backend
process.</span><pclass="MsoNormal"><span style="font-size:13.0pt">Moreover it is also allowed to execute query on the
connectedclient session without any other background process.</span><p class="MsoNormal"><span
style="font-size:13.0pt">ButIf I try to execute some command (like checkpoint) from the client session which requires
anybackground task to perform, it fails because it cannot find the corresponding background process (like checkpoint
process).</span><pclass="MsoNormal"><span style="font-size:13.0pt"> </span><p class="MsoNormal"><span
style="font-size:13.0pt">Iam not sure if this is already known behavior but I found it to be little awkward. This may
leadto some unknown behavior in user application.</span><p class="MsoNormal"><span style="font-size:13.0pt"> </span><p
class="MsoNormal"><spanstyle="font-size:13.0pt">Currently All background process keeps checking if Postmaster is Alive
whilethey wait for any event but for client backend process there is no such mechanism.</span><p
class="MsoNormal"><spanstyle="font-size:13.0pt"> </span><p class="MsoNormal"><span style="font-size:13.0pt">One way to
handlethis issue will be to check whether postmaster is alive after every command read but it will add extra cost for
eachquery execution.</span><p class="MsoNormal"><span style="font-size:13.0pt"> </span><p class="MsoNormal"><span
style="font-size:13.0pt">Anycomments?        </span><p class="MsoNormal"><span style="font-size:13.0pt"> </span><p
class="MsoNormal"><i><spanstyle="font-size:13.0pt;color:black">Thanks and Regards,</span></i><p
class="MsoNormal"><i><spanstyle="font-size:13.0pt">Kumar Rajeev Rastogi<span style="color:black">
</span></span></i><spanstyle="font-size:13.0pt"></span></div> 

Re: Dangling Client Backend Process

From
Amit Kapila
Date:
On Sat, Oct 10, 2015 at 3:42 PM, Rajeev rastogi <rajeev.rastogi@huawei.com> wrote:

I observed one strange behavior today that if postmaster process gets crashed/killed, then it kill all background processes but not the client backend process.

Moreover it is also allowed to execute query on the connected client session without any other background process.

But If I try to execute some command (like checkpoint) from the client session which requires any background task to perform, it fails because it cannot find the corresponding background process (like checkpoint process).

 

I am not sure if this is already known behavior but I found it to be little awkward. This may lead to some unknown behavior in user application.

 


This is a known behaviour and there was some discussion on this
topic [1] previously as well.  I think that thread didn't reach to conclusion,
but there were couple of other reasons discussed in that thread as well to
have the behaviour as you are proposing here.

 

Currently All background process keeps checking if Postmaster is Alive while they wait for any event but for client backend process there is no such mechanism.

 

One way to handle this issue will be to check whether postmaster is alive after every command read but it will add extra cost for each query execution.


I don't think that is a good idea as if there is no command execution
it will still stay as it is and doing such operations on each command
doesn't sound to be good idea even though overhead might not be
big.  There are some other ideas discussed in that thread [2] to achieve
this behaviour, but I think we need to find a portable way to achieve it.


[1] - http://www.postgresql.org/message-id/26217.1371851061@sss.pgh.pa.us

Re: Dangling Client Backend Process

From
Rajeev rastogi
Date:
<div class="WordSection1"><p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">On</span><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">10October 2015 20:45, Amit Kapila Wrote:</span><p
class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><p class="MsoNormal"><span
style="font-size:13.0pt">>>I observed one strange behavior today that if postmaster process gets crashed/killed,
thenit kill all background processes but not the client backend process.</span><p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><pclass="MsoNormal">> This is a
knownbehaviour and there was some discussion on this<p class="MsoNormal">> topic [1] previously as well.  I think
thatthread didn't reach to conclusion,<p class="MsoNormal">> but there were couple of other reasons discussed in
thatthread as well to<p class="MsoNormal">> have the behaviour as you are proposing here.<p class="MsoNormal"> <p
class="MsoNormal"><spanstyle="font-size:13.0pt">Oops..I did not know about this. I shall check the older thread to get
otheropinions.</span><p class="MsoNormal"> <p class="MsoNormal"><span style="font-size:13.0pt">>> One way to
handlethis issue will be to check whether postmaster is alive after every command read but it will add extra cost for
eachquery execution.</span><p class="MsoNormal"><span style="font-size:13.0pt"> </span><p class="MsoNormal">> I
don'tthink that is a good idea as if there is no command execution<p class="MsoNormal">> it will still stay as it is
anddoing such operations on each command<p class="MsoNormal">> doesn't sound to be good idea even though overhead
mightnot be<p class="MsoNormal">> big.  There are some other ideas discussed in that thread [2] to achieve<p
class="MsoNormal">>this behaviour, but I think we need to find a portable way to achieve it.<p
class="MsoNormal"><spanstyle="font-size:13.0pt"> </span><p class="MsoNormal"><span style="font-size:13.0pt">Yes, you
areright that process will not be closed till a new command comes but I think it does not harm functionality in anyway
exceptthat the process and its acquired resources</span><p class="MsoNormal"><span style="font-size:13.0pt">does not
getfreed. Also use-case of application will be very less where their client process stays idle for very long
time.</span><pclass="MsoNormal"><span style="font-size:13.0pt">But at the same time I agree this is not the best
solution,we should look for more appropriate/better one. </span><p class="MsoNormal"><span style="font-size:13.0pt">Now
asit is confirmed to be valid issue, I will spend some time on this to find if there is something more appropriate
solution.</span><pclass="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><pclass="MsoNormal"><i><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">Thanksand Regards,</span></i><p
class="MsoNormal"><i><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Kumar Rajeev
Rastogi</span></i><i><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> </span></i></div> 

Re: Dangling Client Backend Process

From
Rajeev rastogi
Date:
<div class="WordSection1"><p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">On</span><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">12thOctober 2015 20:45, Rajeev Rastogi Wrote:</span><p
class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><p class="MsoNormal"><span
style="font-size:13.0pt">>>>I observed one strange behavior today that if postmaster process gets
crashed/killed,then it kill all background processes but not the client backend process.</span><p
class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><p
class="MsoNormal">>>This is a known behaviour and there was some discussion on this<p class="MsoNormal">>>
topic[1] previously as well.<p class="MsoNormal"> <p class="MsoNormal"><span style="font-size:13.0pt">> Now as it is
confirmedto be valid issue, I will spend some time on this to find if there is something more appropriate
solution.</span><pclass="MsoNormal"><span style="font-size:13.0pt"> </span><p class="MsoNormal"><span
style="font-size:13.0pt">Ichecked the latest code and found Heikki has already added code for secure_read using the
latchmechanism (using WaitLatchOrSocket). It currently waits for two events i.e. WL_LATCH_SET and WL_SOCKET_READABLE.
</span><pclass="MsoNormal"><span style="font-size:13.0pt">            Commit id:
80788a431e9bff06314a054109fdea66ac538199</span><pclass="MsoNormal"><span style="font-size:13.0pt"> </span><p
class="MsoNormal"><spanstyle="font-size:13.0pt">If we add the event WL_POSTMASTER_DEATH also, client backend process
handlingwill become same as other backend process. So postmaster death can be detected in the same way.</span><p
class="MsoNormal"><spanstyle="font-size:13.0pt"> </span><p class="MsoNormal"><span style="font-size:13.0pt">But I am
notsure if WL_POSTMASTER_DEATH event was not added intentionally for some reason. Please confirm.</span><p
class="MsoNormal"><spanstyle="font-size:13.0pt">Also is it OK to add this even handling in generic path of
Libpq?</span><pclass="MsoNormal"><span style="font-size:13.0pt"> </span><p class="MsoNormal"><span
style="font-size:13.0pt">Pleaselet me know if I am missing something?</span><p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><pclass="MsoNormal"><i><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">Thanksand Regards,</span></i><p
class="MsoNormal"><i><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Kumar Rajeev
Rastogi</span></i><i><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> </span></i></div> 

Re: Dangling Client Backend Process

From
Amit Kapila
Date:
On Tue, Oct 13, 2015 at 3:54 PM, Rajeev rastogi <rajeev.rastogi@huawei.com> wrote:

On 12th October 2015 20:45, Rajeev Rastogi Wrote:

 

>>> I observed one strange behavior today that if postmaster process gets crashed/killed, then it kill all background processes but not the client backend process.

 

>> This is a known behaviour and there was some discussion on this

>> topic [1] previously as well.

 

> Now as it is confirmed to be valid issue, I will spend some time on this to find if there is something more appropriate solution.

 

I checked the latest code and found Heikki has already added code for secure_read using the latch mechanism (using WaitLatchOrSocket). It currently waits for two events i.e. WL_LATCH_SET and WL_SOCKET_READABLE.

            Commit id: 80788a431e9bff06314a054109fdea66ac538199

 

If we add the event WL_POSTMASTER_DEATH also, client backend process handling will become same as other backend process. So postmaster death can be detected in the same way.

 

But I am not sure if WL_POSTMASTER_DEATH event was not added intentionally for some reason. Please confirm.

Also is it OK to add this even handling in generic path of Libpq?

 

Please let me know if I am missing something?


I feel this is worth investigation, example for what kind of cases libpq is
used for non-blocking sockets, because for such cases above idea
will not work.

Here, I think the bigger point is that, Tom was not in favour of
this proposal (making backends exit on postmaster death ) at that time,
not sure whether he has changed his mind.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Dangling Client Backend Process

From
Kyotaro HORIGUCHI
Date:
At Wed, 14 Oct 2015 11:08:37 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1L8SGWymhXF+yDpxiyA2ARCiEgQ88XsLhEvJba3Fh_F=Q@mail.gmail.com>
> On Tue, Oct 13, 2015 at 3:54 PM, Rajeev rastogi <rajeev.rastogi@huawei.com>
> wrote:
> > If we add the event WL_POSTMASTER_DEATH also, client backend process
> > handling will become same as other backend process. So postmaster death can
> > be detected in the same way.
> >
> > But I am not sure if WL_POSTMASTER_DEATH event was not added intentionally
> > for some reason. Please confirm.
> > 
> > Also is it OK to add this even handling in generic path of Libpq?
> >
> > Please let me know if I am missing something?
> >
> >
> I feel this is worth investigation, example for what kind of cases libpq is
> used for non-blocking sockets, because for such cases above idea
> will not work.

Blocking mode of a port is changed using
socket_set_nonblocking(). I found two points that the function is
called with true.  pq_getbyte_if_available() and
socket_flush_if_writable(). They seems to be used only in
walsender *so far*.

> Here, I think the bigger point is that, Tom was not in favour of
> this proposal (making backends exit on postmaster death ) at that time,
> not sure whether he has changed his mind.

If I recall correctly, he concerned about killing the backends
running transactions which could be saved. I have a sympathy with
the opinion. But also think it reasonable to kill all backends
immediately so that new postmaster can run...

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Dangling Client Backend Process

From
Andres Freund
Date:
On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote:
> If I recall correctly, he concerned about killing the backends
> running transactions which could be saved. I have a sympathy with
> the opinion.

I still don't. Leaving backends alive after postmaster has died prevents
the auto-restart mechanism to from working from there on.  Which means
that we'll potentially continue happily after another backend has
PANICed and potentially corrupted shared memory. Which isn't all that
unlikely if postmaster isn't around anymore.

Andres



Re: Dangling Client Backend Process

From
Rajeev rastogi
Date:
On 14 October 2015 14:03, Kyotaro HORIGUCHI:

>Subject: Re: [HACKERS] Dangling Client Backend Process
>
>At Wed, 14 Oct 2015 11:08:37 +0530, Amit Kapila <amit.kapila16@gmail.com>
>wrote in
><CAA4eK1L8SGWymhXF+yDpxiyA2ARCiEgQ88XsLhEvJba3Fh_F=Q@mail.gmail.com>
>> On Tue, Oct 13, 2015 at 3:54 PM, Rajeev rastogi
>> <rajeev.rastogi@huawei.com>
>> wrote:
>> > If we add the event WL_POSTMASTER_DEATH also, client backend process
>> > handling will become same as other backend process. So postmaster
>> > death can be detected in the same way.
>> >
>> > But I am not sure if WL_POSTMASTER_DEATH event was not added
>> > intentionally for some reason. Please confirm.
>> >
>> > Also is it OK to add this even handling in generic path of Libpq?
>> >
>> > Please let me know if I am missing something?
>> >
>> >
>> I feel this is worth investigation, example for what kind of cases
>> libpq is used for non-blocking sockets, because for such cases above
>> idea will not work.
>
>Blocking mode of a port is changed using socket_set_nonblocking(). I
>found two points that the function is called with true.
>pq_getbyte_if_available() and socket_flush_if_writable(). They seems to
>be used only in walsender *so far*.

Yes and in that case I assume the proposed solution will work in all cases.

>> Here, I think the bigger point is that, Tom was not in favour of this
>> proposal (making backends exit on postmaster death ) at that time, not
>> sure whether he has changed his mind.

>If I recall correctly, he concerned about killing the backends running
>transactions which could be saved. I have a sympathy with the opinion.
>But also think it reasonable to kill all backends immediately so that
>new postmaster can run...

Yes it will be really helpful to know the earlier reason for "not making backend exit on postmaster death".
Please let me know if there is any thread, which I can refer to find the same.

IMHO there could be below major issues, if we don't kill client backend process on postmaster death:
1. Postmaster cannot be re-started again as pointed by Kyotaro and Andres Also.
2. If from existing client session, we try to do some operation which has dependency with backend process, then that
operationwill either fail or may lead to undefined behavior sometime. 
3. Also unintentionally all operation done by application will not be checkpointed and also will not be flushed by
bgworker. 
4. Replicating of WAL to standby will be stopped and if synchronous standby is configured then command from master will
behanged. 

Thanks and Regards,
Kumar Rajeev Rastogi






Re: Dangling Client Backend Process

From
Amit Kapila
Date:
On Fri, Oct 16, 2015 at 3:34 PM, Rajeev rastogi <rajeev.rastogi@huawei.com> wrote:

Yes it will be really helpful to know the earlier reason for "not making backend exit on postmaster death".
Please let me know if there is any thread, which I can refer to find the same.

IMHO there could be below major issues, if we don't kill client backend process on postmaster death:
1. Postmaster cannot be re-started again as pointed by Kyotaro and Andres Also.
2. If from existing client session, we try to do some operation which has dependency with backend process, then that operation will either fail or may lead to undefined behavior sometime.
3. Also unintentionally all operation done by application will not be checkpointed and also will not be flushed by bgworker.
4. Replicating of WAL to standby will be stopped and if synchronous standby is configured then command from master will be hanged.


What exactly we want to do as part of this proposal?  As far as I
can see, we have below two options on postmaster death:

a. Exit all the active backends in whichever state they are.
b. Exit backend/s after they finish there current work and are
waiting for new command.

I think what we are discussing here is option-b.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Dangling Client Backend Process

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote:
> > If I recall correctly, he concerned about killing the backends
> > running transactions which could be saved. I have a sympathy with
> > the opinion.
> 
> I still don't. Leaving backends alive after postmaster has died prevents
> the auto-restart mechanism to from working from there on.  Which means
> that we'll potentially continue happily after another backend has
> PANICed and potentially corrupted shared memory. Which isn't all that
> unlikely if postmaster isn't around anymore.

I agree.  When postmaster terminates without waiting for all backends to
go away, things are going horribly wrong -- either a DBA has done
something stupid, or the system is misbehaving.  As Andres says, if
another backend dies at that point, things are even worse -- the dying
backend could have been holding a critical lwlock, for instance, or it
could have corrupted shared buffers on its way out.  It is not sensible
to leave the rest of the backends in the system still trying to run just
because there is no one there to kill them.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Dangling Client Backend Process

From
Robert Haas
Date:
On Sat, Oct 17, 2015 at 4:52 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Andres Freund wrote:
>> On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote:
>> > If I recall correctly, he concerned about killing the backends
>> > running transactions which could be saved. I have a sympathy with
>> > the opinion.
>>
>> I still don't. Leaving backends alive after postmaster has died prevents
>> the auto-restart mechanism to from working from there on.  Which means
>> that we'll potentially continue happily after another backend has
>> PANICed and potentially corrupted shared memory. Which isn't all that
>> unlikely if postmaster isn't around anymore.
>
> I agree.  When postmaster terminates without waiting for all backends to
> go away, things are going horribly wrong -- either a DBA has done
> something stupid, or the system is misbehaving.  As Andres says, if
> another backend dies at that point, things are even worse -- the dying
> backend could have been holding a critical lwlock, for instance, or it
> could have corrupted shared buffers on its way out.  It is not sensible
> to leave the rest of the backends in the system still trying to run just
> because there is no one there to kill them.

Yep.  +1 for changing this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dangling Client Backend Process

From
Rajeev rastogi
Date:
On  19 October 2015 21:37, Robert Haas [mailto:robertmhaas@gmail.com] Wrote:

>On Sat, Oct 17, 2015 at 4:52 PM, Alvaro Herrera
><alvherre@2ndquadrant.com> wrote:
>> Andres Freund wrote:
>>> On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote:
>>> > If I recall correctly, he concerned about killing the backends
>>> > running transactions which could be saved. I have a sympathy with
>>> > the opinion.
>>>
>>> I still don't. Leaving backends alive after postmaster has died
>>> prevents the auto-restart mechanism to from working from there on.
>>> Which means that we'll potentially continue happily after another
>>> backend has PANICed and potentially corrupted shared memory. Which
>>> isn't all that unlikely if postmaster isn't around anymore.
>>
>> I agree.  When postmaster terminates without waiting for all backends
>> to go away, things are going horribly wrong -- either a DBA has done
>> something stupid, or the system is misbehaving.  As Andres says, if
>> another backend dies at that point, things are even worse -- the dying
>> backend could have been holding a critical lwlock, for instance, or it
>> could have corrupted shared buffers on its way out.  It is not
>> sensible to leave the rest of the backends in the system still trying
>> to run just because there is no one there to kill them.
>
>Yep.  +1 for changing this.

Seems many people are in favor of this change.
I have made changes to handle backend exit on postmaster death (after they finished their work and waiting for new
command).
 
Changes are as per approach explained in my earlier mail i.e.
1. WaitLatchOrSocket called from secure_read and secure_write function will wait on an additional event as
WL_POSTMASTER_DEATH.
2. There is a possibility that the command is read without waiting on latch. This case is handled by checking
postmasterstatus after command read (i.e. after ReadCommand).
 

Attached is the patch.

Thanks and Regards,
Kumar Rajeev Rastogi




Attachment

Re: Dangling Client Backend Process

From
Robert Haas
Date:
On Tue, Oct 20, 2015 at 12:48 AM, Rajeev rastogi
<rajeev.rastogi@huawei.com> wrote:
> On  19 October 2015 21:37, Robert Haas [mailto:robertmhaas@gmail.com] Wrote:
>
>>On Sat, Oct 17, 2015 at 4:52 PM, Alvaro Herrera
>><alvherre@2ndquadrant.com> wrote:
>>> Andres Freund wrote:
>>>> On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote:
>>>> > If I recall correctly, he concerned about killing the backends
>>>> > running transactions which could be saved. I have a sympathy with
>>>> > the opinion.
>>>>
>>>> I still don't. Leaving backends alive after postmaster has died
>>>> prevents the auto-restart mechanism to from working from there on.
>>>> Which means that we'll potentially continue happily after another
>>>> backend has PANICed and potentially corrupted shared memory. Which
>>>> isn't all that unlikely if postmaster isn't around anymore.
>>>
>>> I agree.  When postmaster terminates without waiting for all backends
>>> to go away, things are going horribly wrong -- either a DBA has done
>>> something stupid, or the system is misbehaving.  As Andres says, if
>>> another backend dies at that point, things are even worse -- the dying
>>> backend could have been holding a critical lwlock, for instance, or it
>>> could have corrupted shared buffers on its way out.  It is not
>>> sensible to leave the rest of the backends in the system still trying
>>> to run just because there is no one there to kill them.
>>
>>Yep.  +1 for changing this.
>
> Seems many people are in favor of this change.
> I have made changes to handle backend exit on postmaster death (after they finished their work and waiting for new
command).
> Changes are as per approach explained in my earlier mail i.e.
> 1. WaitLatchOrSocket called from secure_read and secure_write function will wait on an additional event as
WL_POSTMASTER_DEATH.
> 2. There is a possibility that the command is read without waiting on latch. This case is handled by checking
postmasterstatus after command read (i.e. after ReadCommand).
 
>
> Attached is the patch.

I don't think that proc_exit(1) is the right way to exit here.  It's
not very friendly to exit without at least attempting to give the
client a clue about what has gone wrong.  I suggest something like
this:
           ereport(FATAL,                   (errcode(ERRCODE_ADMIN_SHUTDOWN),            errmsg("terminating connection
dueto postmaster shutdown")));
 

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dangling Client Backend Process

From
Alvaro Herrera
Date:
Robert Haas wrote:

> I don't think that proc_exit(1) is the right way to exit here.  It's
> not very friendly to exit without at least attempting to give the
> client a clue about what has gone wrong.  I suggest something like
> this:
> 
>             ereport(FATAL,
>                     (errcode(ERRCODE_ADMIN_SHUTDOWN),
>              errmsg("terminating connection due to postmaster shutdown")));

Agreed, but I don't think "shutdown" is the right word to use here --
that makes it sound like it was orderly.  Perhaps "crash"?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Dangling Client Backend Process

From
Robert Haas
Date:
On Tue, Oct 20, 2015 at 12:11 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> I don't think that proc_exit(1) is the right way to exit here.  It's
>> not very friendly to exit without at least attempting to give the
>> client a clue about what has gone wrong.  I suggest something like
>> this:
>>
>>             ereport(FATAL,
>>                     (errcode(ERRCODE_ADMIN_SHUTDOWN),
>>              errmsg("terminating connection due to postmaster shutdown")));
>
> Agreed, but I don't think "shutdown" is the right word to use here --
> that makes it sound like it was orderly.  Perhaps "crash"?

Well, that's a little speculative.  "due to unexpected postmaster exit"?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dangling Client Backend Process

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Tue, Oct 20, 2015 at 12:11 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:

> > Agreed, but I don't think "shutdown" is the right word to use here --
> > that makes it sound like it was orderly.  Perhaps "crash"?
> 
> Well, that's a little speculative.  "due to unexpected postmaster exit"?

WFM.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Dangling Client Backend Process

From
Rajeev rastogi
Date:
On 20 October 2015 23:34, Robert Haas [mailto:robertmhaas@gmail.com] Wrote:

>On Tue, Oct 20, 2015 at 12:11 PM, Alvaro Herrera
><alvherre@2ndquadrant.com> wrote:
>> Robert Haas wrote:
>>> I don't think that proc_exit(1) is the right way to exit here.  It's
>>> not very friendly to exit without at least attempting to give the
>>> client a clue about what has gone wrong.  I suggest something like
>>> this:
>>>
>>>             ereport(FATAL,
>>>                     (errcode(ERRCODE_ADMIN_SHUTDOWN),
>>>              errmsg("terminating connection due to postmaster
>>> shutdown")));
>>
>> Agreed, but I don't think "shutdown" is the right word to use here --
>> that makes it sound like it was orderly.  Perhaps "crash"?
>
>Well, that's a little speculative.  "due to unexpected postmaster exit"?

Agreed. Attached is the patch with changes.

Thanks and Regards,
Kumar Rajeev Rastogi



Attachment

Re: Dangling Client Backend Process

From
Robert Haas
Date:
On Tue, Oct 20, 2015 at 11:42 PM, Rajeev rastogi
<rajeev.rastogi@huawei.com> wrote:
> Agreed. Attached is the patch with changes.

Well, I'm not buying this extra PostmasterIsAlive() call on every pass
through the main loop.  That seems more expensive than we can really
justify. Checking this when we're already calling WaitLatchOrSocket is
basically free, but the other part is not.

Here's a version with that removed and some changes to the comments.
I still don't think this is quite working right, though, because
here's what happened when I killed the postmaster:

rhaas=# select 1;
 ?column?
----------
        1
(1 row)

rhaas=# \watch
Watch every 2s    Thu Oct 22 16:24:10 2015

 ?column?
----------
        1
(1 row)

Watch every 2s    Thu Oct 22 16:24:12 2015

 ?column?
----------
        1
(1 row)

Watch every 2s    Thu Oct 22 16:24:14 2015

 ?column?
----------
        1
(1 row)

Watch every 2s    Thu Oct 22 16:24:16 2015

 ?column?
----------
        1
(1 row)

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Note that the error message doesn't actually show up on the client (it
did show up in the log).  I guess that may be inevitable if we're
blocked in secure_write(), but if we're in secure_read() maybe it
should work?  I haven't investigated why it doesn't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Dangling Client Backend Process

From
Rajeev rastogi
Date:
On 23 October 2015 01:58, Robert Haas [mailto:robertmhaas@gmail.com] Wrote:

>Well, I'm not buying this extra PostmasterIsAlive() call on every pass
>through the main loop.  That seems more expensive than we can really
>justify. Checking this when we're already calling WaitLatchOrSocket is
>basically free, but the other part is not.

Agree. 

>Here's a version with that removed and some changes to the comments.

Thanks for changing.

>I still don't think this is quite working right, though, because here's
>what happened when I killed the postmaster:
>
>rhaas=# select 1;
> ?column?
>----------
>        1
>(1 row)
>
>rhaas=# \watch
>Watch every 2s    Thu Oct 22 16:24:10 2015
>
> ?column?
>----------
>        1
>(1 row)
>
>Watch every 2s    Thu Oct 22 16:24:12 2015
>
> ?column?
>----------
>        1
>(1 row)
>
>Watch every 2s    Thu Oct 22 16:24:14 2015
>
> ?column?
>----------
>        1
>(1 row)
>
>Watch every 2s    Thu Oct 22 16:24:16 2015
>
> ?column?
>----------
>        1
>(1 row)
>
>server closed the connection unexpectedly
>    This probably means the server terminated abnormally
>    before or while processing the request.
>The connection to the server was lost. Attempting reset: Failed.
>
>Note that the error message doesn't actually show up on the client (it
>did show up in the log).  I guess that may be inevitable if we're
>blocked in secure_write(), but if we're in secure_read() maybe it should
>work?  I haven't investigated why it doesn't.

Actually in this case client is not waiting for the response from the server rather it is waiting for new command from
user.
 
So even though server has sent the response to client, client is not able to receive.
Once client receives the next command to execute, by the time connection has terminated from server side and hence  it
comesout with the above message (i.e. server closed the connection...)
 

Though I am also in favor of  providing some error message to client. But with the current infrastructure, I don’t
thinkthere is any way to pass this error message to client [This error has happened without involvement of the client
side].

Comments?

Thanks and Regards,
Kumar Rajeev Rastogi

Re: Dangling Client Backend Process

From
Robert Haas
Date:
On Tue, Oct 27, 2015 at 6:29 AM, Rajeev rastogi
<rajeev.rastogi@huawei.com> wrote:
> On 23 October 2015 01:58, Robert Haas [mailto:robertmhaas@gmail.com] Wrote:
>>Well, I'm not buying this extra PostmasterIsAlive() call on every pass
>>through the main loop.  That seems more expensive than we can really
>>justify. Checking this when we're already calling WaitLatchOrSocket is
>>basically free, but the other part is not.
>
> Agree.
>
>>Here's a version with that removed and some changes to the comments.
>
> Thanks for changing.
>
>>I still don't think this is quite working right, though, because here's
>>what happened when I killed the postmaster:
>>
>>rhaas=# select 1;
>> ?column?
>>----------
>>        1
>>(1 row)
>>
>>rhaas=# \watch
>>Watch every 2s    Thu Oct 22 16:24:10 2015
>>
>> ?column?
>>----------
>>        1
>>(1 row)
>>
>>Watch every 2s    Thu Oct 22 16:24:12 2015
>>
>> ?column?
>>----------
>>        1
>>(1 row)
>>
>>Watch every 2s    Thu Oct 22 16:24:14 2015
>>
>> ?column?
>>----------
>>        1
>>(1 row)
>>
>>Watch every 2s    Thu Oct 22 16:24:16 2015
>>
>> ?column?
>>----------
>>        1
>>(1 row)
>>
>>server closed the connection unexpectedly
>>    This probably means the server terminated abnormally
>>    before or while processing the request.
>>The connection to the server was lost. Attempting reset: Failed.
>>
>>Note that the error message doesn't actually show up on the client (it
>>did show up in the log).  I guess that may be inevitable if we're
>>blocked in secure_write(), but if we're in secure_read() maybe it should
>>work?  I haven't investigated why it doesn't.
>
> Actually in this case client is not waiting for the response from the server rather it is waiting for new command
fromuser. 
> So even though server has sent the response to client, client is not able to receive.
> Once client receives the next command to execute, by the time connection has terminated from server side and hence
itcomes out with the above message (i.e. server closed the connection...) 
>
> Though I am also in favor of  providing some error message to client. But with the current infrastructure, I don’t
thinkthere is any way to pass this error message to client [This error has happened without involvement of the client
side].
>
> Comments?

Hmm.  ProcessInterrupts() signals some FATAL errors while the
connection is idle, and rumor has it that that works: the client
doesn't immediately read the FATAL error, but the next time it sends a
query, it tries to read from the connection and sees the FATAL error
at that time.  I wonder why that's not working here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dangling Client Backend Process

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Hmm.  ProcessInterrupts() signals some FATAL errors while the
> connection is idle, and rumor has it that that works: the client
> doesn't immediately read the FATAL error, but the next time it sends a
> query, it tries to read from the connection and sees the FATAL error
> at that time.  I wonder why that's not working here.

A likely theory is that the kernel is reporting failure to libpq's
send() because the other side of the connection is already gone.
This would be timing-dependent of course.
        regards, tom lane



Re: Dangling Client Backend Process

From
Andres Freund
Date:
On 2015-10-30 09:48:33 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Hmm.  ProcessInterrupts() signals some FATAL errors while the
> > connection is idle, and rumor has it that that works: the client
> > doesn't immediately read the FATAL error, but the next time it sends a
> > query, it tries to read from the connection and sees the FATAL error
> > at that time.  I wonder why that's not working here.
>
> A likely theory is that the kernel is reporting failure to libpq's
> send() because the other side of the connection is already gone.
> This would be timing-dependent of course.

Looking at a strace psql over unix socket is actually receiving the
error message:
recvfrom(3, "E\0\0\0lSFATAL\0C57P01\0Mterminating "..., 16384, 0, NULL, NULL) = 109
but psql does print:
server closed the connection unexpectedly

it happens to work over localhost:
FATAL:  57P01: terminating connection due to unexpected postmaster exit
LOCATION:  secure_read, be-secure.c:170
server closed the connection unexpectedly      This probably means the server terminated abnormally      before or
whileprocessing the request.
 

the problem seems to be the loop eating all the remaining input:
void
pqHandleSendFailure(PGconn *conn)
{/* * Accept any available input data, ignoring errors.  Note that if * pqReadData decides the backend has closed the
channel,it will close * our side of the socket --- that's just what we want here. */while (pqReadData(conn) > 0)     /*
loopuntil no more data readable */ ;
 

after the first pqReadData() there's no remaining input and thus the
second call to pqReadData()'s pqsecure_read reads 0 and this is hit:/* * OK, we are getting a zero read even though
select()says ready. This * means the connection has been closed.  Cope. */
 
definitelyEOF:printfPQExpBuffer(&conn->errorMessage,                  libpq_gettext(                            "server
closedthe connection unexpectedly\n"               "\tThis probably means the server terminated abnormally\n"
             "\tbefore or while processing the request.\n"));
 

adding a parseInput(conn) into the loop yields the expected
FATAL:  57P01: terminating connection due to unexpected postmaster exit

Is there really any reason not to do that?

Greetings,

Andres Freund



Re: Dangling Client Backend Process

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> adding a parseInput(conn) into the loop yields the expected
> FATAL:  57P01: terminating connection due to unexpected postmaster exit
> Is there really any reason not to do that?

Might work, but it probably needs some study:
(a) is it safe
(b) is this the right place / are there other places
        regards, tom lane



Re: Dangling Client Backend Process

From
Andres Freund
Date:
On 2015-10-30 10:57:45 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > adding a parseInput(conn) into the loop yields the expected
> > FATAL:  57P01: terminating connection due to unexpected postmaster exit
> > Is there really any reason not to do that?
> 
> Might work, but it probably needs some study:

Yea, definitely. I was just at pgconf.eu's keynote catching up on a
talk. No fully thought through proposal's to be expected ;)

> (a) is it safe

I don't immediately see why not.

> (b) is this the right place / are there other places

I think it's ok for the send failure case, in a quick lookthrough I
didn't find anything else for writes - I'm not entirely sure all read
cases are handled tho, but it seems less likely to be mishandles.

Greetings,

Andres Freund



Re: Dangling Client Backend Process

From
Rajeev rastogi
Date:
On 30 October 2015 20:33, Andres Freund Wrote:
>On 2015-10-30 10:57:45 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>> > adding a parseInput(conn) into the loop yields the expected
>> > FATAL:  57P01: terminating connection due to unexpected postmaster
>> > exit Is there really any reason not to do that?
>>
>> Might work, but it probably needs some study:
>
>Yea, definitely. I was just at pgconf.eu's keynote catching up on a talk.
>No fully thought through proposal's to be expected ;)
>
>> (a) is it safe
>
>I don't immediately see why not.
>
>> (b) is this the right place / are there other places
>
>I think it's ok for the send failure case, in a quick lookthrough I
>didn't find anything else for writes - I'm not entirely sure all read
>cases are handled tho, but it seems less likely to be mishandles.


I also did some analysis as Andres suggested and observed that since the error message is already sent by backend to
frontend, 
so error message is available on connection channel just that is was not received/processed by frontend.
Also pqHandleSendFailure was checking the available data if any but it was ignored to process.

So as Andres suggested above, if we add parseInput to receiver and parse the available message on connection channel,
wecould display appropriate error.  

IMHO above proposed place to add parseInput seems to be right, as already this function takes of handling " NOTICE
messagethat the backend might have sent just before it died " 

Attached is the patch with this change.

Comments?

Thanks and Regards,
Kumar Rajeev Rastogi

Attachment

Re: Dangling Client Backend Process

From
Robert Haas
Date:
On Fri, Oct 30, 2015 at 11:03 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-10-30 10:57:45 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>> > adding a parseInput(conn) into the loop yields the expected
>> > FATAL:  57P01: terminating connection due to unexpected postmaster exit
>> > Is there really any reason not to do that?
>>
>> Might work, but it probably needs some study:
>
> Yea, definitely. I was just at pgconf.eu's keynote catching up on a
> talk. No fully thought through proposal's to be expected ;)
>
>> (a) is it safe
>
> I don't immediately see why not.
>
>> (b) is this the right place / are there other places
>
> I think it's ok for the send failure case, in a quick lookthrough I
> didn't find anything else for writes - I'm not entirely sure all read
> cases are handled tho, but it seems less likely to be mishandles.

pqHandleSendFailure() has this comment:

 * Primarily, what we want to accomplish here is to process an async
 * NOTICE message that the backend might have sent just before it died.

And also this comment:

     * Accept any available input data, ignoring errors.  Note that if
     * pqReadData decides the backend has closed the channel, it will close
     * our side of the socket --- that's just what we want here.

And finally this comment:

     * Parse any available input messages.  Since we are in PGASYNC_IDLE
     * state, only NOTICE and NOTIFY messages will be eaten.

Now, taken together, these messages suggest two conclusions:

1. The decision to ignore errors here was deliberate.
2. Calling pqParseInput() wouldn't notice errors anyway because the
connection is PGASYNC_IDLE.

With respect to the first conclusion, I think it's fair to argue that,
while this may have seemed like a reasonable idea at the time, it's
probably not what we want any more.  Quite apart from the patch
proposed here, ProcessInterrupts() has assume for years (certainly
since 9.0, I think) that it's legitimate to signal a FATAL error to an
idle client and assume that the client will read that error as a
response to its next protocol message.  So it seems to me that this
function should be adjusted to notice errors, and not just notice and
notify messages.

The second conclusion does not appear to be correct.  parseInput()
will call pqParseInput3() or pqParseInput2(), either of which will
handle an error as if it were a notice - i.e. by printing it out.

Here's a patch based on that analysis, addressing just that one
function, not any of the other changes talked about on this thread.
Does this make sense?  Would we want to back-patch it, and if so how
far, or just adjust master?  My gut is just master, but I don't know
why this issue wouldn't also affect Hot Standby kills and maybe other
kinds of connection termination situations, so maybe there's an
argument for back-patching.  On the third hand, failing to read the
error message off of a just-terminated connection isn't exactly a
crisis of the first order either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Dangling Client Backend Process

From
Michael Paquier
Date:
On Wed, Nov 4, 2015 at 2:18 AM, Robert Haas wrote:
>
> The second conclusion does not appear to be correct.  parseInput()
> will call pqParseInput3() or pqParseInput2(), either of which will
> handle an error as if it were a notice - i.e. by printing it out.

Right per pqGetErrorNotice3 when the connection is in PGASYNC_IDLE state.

> Here's a patch based on that analysis, addressing just that one
> function, not any of the other changes talked about on this thread.
> Does this make sense?  Would we want to back-patch it, and if so how
> far, or just adjust master?  My gut is just master, but I don't know
> why this issue wouldn't also affect Hot Standby kills and maybe other
> kinds of connection termination situations, so maybe there's an
> argument for back-patching.  On the third hand, failing to read the
> error message off of a just-terminated connection isn't exactly a
> crisis of the first order either.

Looks sane to me. As the connection is in PGASYNC idle state when
crossing the path of pqHandleSendFailure() we would finish eating up
all the error messages received from server and print an internal
notice for the rest with "message type blah received from server while
idle. Based on the lack of complaints regarding libpq on this side, I
would just go for master, as for 9.5 is pretty late in this game to
put some dust on it before a potential backpatch.
-- 
Michael



Re: Dangling Client Backend Process

From
Robert Haas
Date:
On Wed, Nov 11, 2015 at 1:55 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Nov 4, 2015 at 2:18 AM, Robert Haas wrote:
>> The second conclusion does not appear to be correct.  parseInput()
>> will call pqParseInput3() or pqParseInput2(), either of which will
>> handle an error as if it were a notice - i.e. by printing it out.
>
> Right per pqGetErrorNotice3 when the connection is in PGASYNC_IDLE state.
>
>> Here's a patch based on that analysis, addressing just that one
>> function, not any of the other changes talked about on this thread.
>> Does this make sense?  Would we want to back-patch it, and if so how
>> far, or just adjust master?  My gut is just master, but I don't know
>> why this issue wouldn't also affect Hot Standby kills and maybe other
>> kinds of connection termination situations, so maybe there's an
>> argument for back-patching.  On the third hand, failing to read the
>> error message off of a just-terminated connection isn't exactly a
>> crisis of the first order either.
>
> Looks sane to me. As the connection is in PGASYNC idle state when
> crossing the path of pqHandleSendFailure() we would finish eating up
> all the error messages received from server and print an internal
> notice for the rest with "message type blah received from server while
> idle. Based on the lack of complaints regarding libpq on this side, I
> would just go for master, as for 9.5 is pretty late in this game to
> put some dust on it before a potential backpatch.

OK, committed just to master then, plus the change which is the
purpose of this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company