Re: Bug in walsender when calling out to do_pg_stop_backup (and others?) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Bug in walsender when calling out to do_pg_stop_backup (and others?)
Date
Msg-id 4EDA503A.9010209@enterprisedb.com
Whole thread Raw
In response to Re: Bug in walsender when calling out to do_pg_stop_backup (and others?)  (Greg Jaskiewicz <gj@pointblue.com.pl>)
Responses Detecting loss of client connection (was Re: Bug in walsender when calling out to do_pg_stop_backup (and others?))  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Re: Bug in walsender when calling out to do_pg_stop_backup (and others?)  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On 19.10.2011 19:41, Greg Jaskiewicz wrote:
> On 19 Oct 2011, at 18:28, Florian Pflug wrote:
>> All the other flags which indicate cancellation reasons are set from signal handers, I believe. We could of course
markas ClientConnectionLostPending as volatile just to be consistent. Not sure whether that's a good idea, or not. It
mightprevent a mistake should we ever add code to detect lost connections asynchronously (i.e., from somewhere else
thanpq_flush). And the cost is probably negligible, because CHECK_FOR_INTERRUPTS tests for InterruptPending before
callingProcessInterrupts(), so we only pay the cost of volatile if there's actually an interrupt pending. But I still
thinkit's better to add qualifies such a volatile only when really necessary. A comment about why it *isn't* volatile
isprobably in order, though, so I'll add that in the next version of the patch. 
>>
> Makes sense.
>
> I had to ask, because it sticks out. And indeed there is a possibility that someone will one day will try to use from
signalhandler, etc. 

Let's just mark it as volatile. It's not clear to me that we'll never
set or read the flag while in a signal handler. We probably don't, but
what if ImmediateInterruptOK is 'true', for example, just when we fail
to send a response and try to set the flags. In that case, we have to be
careful that ClientConnectionLost is set before InterruptPending (which
we can only guarantee if both are volatile, I believe), otherwise a
signal might arrive when we've just set InterruptPending, but not yet
ClientConnectionLost. ProcessInterrupts() would clear InterruptPending,
but not see ClientConnectionLost, so we would lose the interrupt.

That's not serious, and the window for that happening would be extremely
small, and I don't think it can actually happen as the code stands,
because we never try to send anything while ImmediateInterruptOK ==
true. But better safe than sorry.


One small change I'd like to make is to treat the loss of connection
more as a new "top-level" event, rather than as a new reason for query
cancellation. A lost connection is more like receiving SIGTERM, really.
Please take a look at the attached patch. I've been testing this by
doing "SELECT pg_sleep(1), a from generate_series(1,1000) a;", and
killing the connection with "killall -9 psql".

One remaining issue I have with this is the wording of the error
message. The closest existing message we have is in old-mode COPY:

ereport(FATAL,
    (errcode(ERRCODE_CONNECTION_FAILURE),
     errmsg("connection lost during COPY to stdout")));

In the patch, I made the new message just "connection lost", but does
anyone else have a better opinion on that? Perhaps it should be
"connection lost while sending response to client". Or "connection lost
during execution of statement", but that might not be accurate if we're
not executing a statement at the moment, but just sending a "sync"
message or similar.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: backup_label during crash recovery: do we know how to solve it?
Next
From: Peter Eisentraut
Date:
Subject: Re: psql line number reporting from stdin