Thread: [patch] helps fe-connect.c handle -EINTR more gracefully

[patch] helps fe-connect.c handle -EINTR more gracefully

From
David Ford
Date:
I'm fresh in the code, but this has solved my issues with PQconnect*
failing when interrupted by signals.  Some of it is sloppy and not to my
liking yet, but I'm still digging through to see if anything else needs
touched.  Comments appreciated.

Honestly, I'm a bit surprised that this issue hasn't been encountered
before.

Summary:
    * changes to connect() sections to handle errno=EINTR.  this solves
libpq PQconnect* family problems if the connect is interrupted by a
signal such as SIGALRM.
    * not all read/recv/write/send calls have been updated

David

--- src/interfaces/libpq/fe-connect.c.orig    Wed Oct 24 17:43:52 2001
+++ src/interfaces/libpq/fe-connect.c    Wed Oct 24 17:43:54 2001
@@ -912,21 +912,35 @@
      * Thus, we have to make arrangements for all eventualities.
      * ----------
      */
+
+    retry_socket:
     if (connect(conn->sock, &conn->raddr.sa, conn->raddr_len) < 0)
     {
-        if (SOCK_ERRNO == EINPROGRESS || SOCK_ERRNO == EWOULDBLOCK || SOCK_ERRNO == 0)
-        {
-            /*
-             * This is fine - we're in non-blocking mode, and the
-             * connection is in progress.
-             */
-            conn->status = CONNECTION_STARTED;
-        }
-        else
-        {
-            /* Something's gone wrong */
-            connectFailureMessage(conn, SOCK_ERRNO);
-            goto connect_errReturn;
+        switch (SOCK_ERRNO) {
+            case EINTR:
+                /*
+                 * interrupted by signal, keep trying
+                 */
+                 goto retry_socket;
+                 break;
+
+            case 0:
+            case EINPROGRESS:
+            case EWOULDBLOCK:
+                /*
+                 * This is fine - we're in non-blocking mode, and the
+                 * connection is in progress.
+                 */
+                conn->status = CONNECTION_STARTED;
+                break;
+
+            default:
+                /*
+                 * Something's gone wrong
+                 */
+                connectFailureMessage(conn, SOCK_ERRNO);
+                goto connect_errReturn;
+                break;
         }
     }
     else
@@ -2132,8 +2146,13 @@
                "PQrequestCancel() -- socket() failed: ");
         goto cancel_errReturn;
     }
-    if (connect(tmpsock, &conn->raddr.sa, conn->raddr_len) < 0)
+    while (connect(tmpsock, &conn->raddr.sa, conn->raddr_len) < 0)
     {
+        /*
+         * interrupted by a signal
+         */
+        if(errno==EINTR)
+            continue;
         strcpy(conn->errorMessage.data,
                "PQrequestCancel() -- connect() failed: ");
         goto cancel_errReturn;

Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Brent Verner
Date:
On 25 Oct 2001 at 17:08 (-0400), David Ford wrote:
| I'm fresh in the code, but this has solved my issues with PQconnect*
| failing when interrupted by signals.  Some of it is sloppy and not to my
| liking yet, but I'm still digging through to see if anything else needs
| touched.  Comments appreciated.

Disclaimer: I may be wrong as hell ;-), but...


I'm not sure this is correct.  I've tried to /make/ a SIGALRM cause
connect to errno==EINTR, but I can't cause this condition.  I suspect
you have another signal being raised that is causing your symptom.
FTR, the textbook definition[1] of EINTR error for connect is:

  The attempt to establish a connection was interrupted by delivery
  of a signal that was caught; the connection will be established
  asynchronously.

Please check the attached prog to see if it is representative of your
code relating to the connect error you're seeing.  If it is, please
run it and see if you can get it to cause the EINTR error from connect.
If you can't I'm more certain that you have a problem elsewhere.

cheers.
  brent

1. http://www.opengroup.org/onlinepubs/7908799/xns/connect.html

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing."  -- Duane Allman

Attachment

Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Tom Lane
Date:
Brent Verner <brent@rcfile.org> writes:
> I'm not sure this is correct.  I've tried to /make/ a SIGALRM cause
> connect to errno==EINTR, but I can't cause this condition.

It wouldn't surprise me in the least if this behavior is
platform-dependent.  It may well be that David's kernel will allow
connect() to be interrupted by SIGALRM while yours won't.  (Which
reminds me that neither of you specified what platforms you were
testing on.  For shame.)  Or maybe the difference depends on whether
you are trying to connect to a local or remote server.

Unless someone can point out a situation where retrying connect()
after EINTR is actively bad, my inclination is to accept the patch.
It seems like a net improvement in robustness to me, with no evident
downside other than a line or two more code.
        regards, tom lane


Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
David Ford
Date:
>
>
>
>It wouldn't surprise me in the least if this behavior is
>platform-dependent.  It may well be that David's kernel will allow
>connect() to be interrupted by SIGALRM while yours won't.  (Which
>reminds me that neither of you specified what platforms you were
>testing on.  For shame.)  Or maybe the difference depends on whether
>you are trying to connect to a local or remote server.
>
>Unless someone can point out a situation where retrying connect()
>after EINTR is actively bad, my inclination is to accept the patch.
>It seems like a net improvement in robustness to me, with no evident
>downside other than a line or two more code.
>

I didn't specify my OS because this sort of a thing is standard *nix etc 
design (well, m$ excluded of course).

I use Linux.  Every *nix that I know of can have system calls be 
interrupted.

Please wait a day before applying the patch, I want to make it a bit 
more clean/readable and make sure I covered everything in fe-connect.c, 
I found that the SSL functions are traversed even if ssl is turned off 
in the config file and I have to handle that too.

David




Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
David Ford
Date:
Many signals may be the cause of -EINTR.  It depends on what the signal 
is as to how it's normally handled.  sigalarm is the most common due to 
it being a timer event.

Generate a timer that expires as fast as possible (not too fast to 
prevent code execution), and you should see things left and right return 
with -EINTR.

I'm very much aware of why SIGALRM is happening, I generate it and I 
catch it.  As per my original message on this thread, my program does 
maintenance on a scheduled basis.  The period of that maintenance is 
many times per second.

Sooo... :)

Now let's get on with the story.

Libpq doesn't deal with system calls being interrupted in the slightest. None of the read/write or socket calls handle
anyerrors.  Even benign 
 
returns i.e. EINTR are treated as fatal errors and returned.  Not to 
malign, but there is no reason not to continue on and handle EINTR.

David
p.s. you cant use sleep() or alarm() functions and have a timer event as 
well.  The only POSIX compliant function that doesn't trample signal 
timer events is nanosleep().

Brent Verner wrote:

On 25 Oct 2001 at 17:08 (-0400), David Ford wrote:
| I'm fresh in the code, but this has solved my issues with PQconnect* 
| failing when interrupted by signals.  Some of it is sloppy and not to my 
| liking yet, but I'm still digging through to see if anything else needs 
| touched.  Comments appreciated.

Disclaimer: I may be wrong as hell ;-), but...


I'm not sure this is correct.  I've tried to /make/ a SIGALRM cause
connect to errno==EINTR, but I can't cause this condition.  I suspect
you have another signal being raised that is causing your symptom.
FTR, the textbook definition[1] of EINTR error for connect is:
 The attempt to establish a connection was interrupted by delivery  of a signal that was caught; the connection will be
established asynchronously.
 

Please check the attached prog to see if it is representative of your
code relating to the connect error you're seeing.  If it is, please
run it and see if you can get it to cause the EINTR error from connect.
If you can't I'm more certain that you have a problem elsewhere.

cheers. brent

1. http://www.opengroup.org/onlinepubs/7908799/xns/connect.html


------------------------------------------------------------------------
[snipped]



Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Brent Verner
Date:
On 26 Oct 2001 at 00:05 (-0400), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > I'm not sure this is correct.  I've tried to /make/ a SIGALRM cause
| > connect to errno==EINTR, but I can't cause this condition.
| 
| It wouldn't surprise me in the least if this behavior is
| platform-dependent.  It may well be that David's kernel will allow
| connect() to be interrupted by SIGALRM while yours won't.  (Which
| reminds me that neither of you specified what platforms you were
| testing on.  For shame.)  Or maybe the difference depends on whether
| you are trying to connect to a local or remote server.

sorry, I tested the attached prog on linux(2.2/2.4) and freebsd(4.4R)
to both local and remote(slow) servers.

| Unless someone can point out a situation where retrying connect()
| after EINTR is actively bad, my inclination is to accept the patch.
| It seems like a net improvement in robustness to me, with no evident
| downside other than a line or two more code.
 I've found numerous examples where connect() is retried after EINTR,
infact it appears to be fairly common.

cheers, brent

-- 
"Develop your talent, man, and leave the world something. Records are 
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing."  -- Duane Allman


Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Tom Lane
Date:
Brent Verner <brent@rcfile.org> writes:
> | Unless someone can point out a situation where retrying connect()
> | after EINTR is actively bad, my inclination is to accept the patch.

>   I've found numerous examples where connect() is retried after EINTR,
> infact it appears to be fairly common.

Perhaps it does work that way on your system, but that's not the point.
On a machine that behaves that way, we'll never see EINTR returned by
connect(), and so our reaction to it is unimportant.  The question is
what we should do if we *do* get EINTR from connect().  AFAICS, the
appropriate response is to retry.  We already do retry after EINTR in
libpq's recv, send, select, etc calls --- perhaps connect got overlooked
because it's usually only done at program startup.

After further thought, though, it's unclear to me why this solves
David's problem.  If he's got a repeating SIGALRM on a cycle short
enough to interrupt a connect(), seems like it'd just fail again
on the next try.
        regards, tom lane


Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Tom Lane
Date:
David Ford <david@blue-labs.org> writes:
> Please wait a day before applying the patch, I want to make it a bit 
> more clean/readable and make sure I covered everything in fe-connect.c, 

BTW, reading the HPUX man page for connect I find the following relevant
error codes:
         [EALREADY]               Nonblocking I/O is enabled with                                  O_NONBLOCK,
O_NDELAY,or FIOSNBIO, and a                                  previous connection attempt has not yet
             completed.
 
         [EINPROGRESS]            Nonblocking I/O is enabled using                                  O_NONBLOCK,
O_NDELAY,or FIOSNBIO, and                                  the connection cannot be completed
      immediately.  This is not a failure.                                  Make the connect() call again a few
                        seconds later.  Alternatively, wait for                                  completion by calling
select()and                                  selecting for write.
 
         [EINTR]                  The connect was interrupted by a signal                                  before the
connectsequence was                                  complete.  The building of the
connectionstill takes place, even                                  though the user is not blocked on the
                 connect() call.
 
         [EISCONN]                The socket is already connected.

This does not actually *say* that the appropriate behavior after EINTR
is to retry, but reading between the lines one might infer that it will
work like the nonblocking case, wherein a retry of connect tries to link
to the existing connection attempt, not start a new one.

What's more important is that a retry will expose the possibility of
getting EALREADY or EISCONN.  EALREADY certainly must be treated as
success the same as EINPROGRESS (if it exists on a given platform ---
better #ifdef it I think).  Not so sure about EISCONN; does that imply
"you moron, this socket's been open forever", or does it get returned on
the first iteration that doesn't return EALREADY?
        regards, tom lane


Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
David Ford
Date:
>
>
>After further thought, though, it's unclear to me why this solves
>David's problem.  If he's got a repeating SIGALRM on a cycle short
>enough to interrupt a connect(), seems like it'd just fail again
>on the next try.
>

Ok, a few things.  The connect() call is just an interface to the 
kernel.  Sometimes a connect() to a socket may take a long time, even up 
to two minutes (or depending on your kernel's timeout), so it isn't 
unfeasible that the call can be interrupted.  Next, the userland 
connect() is interrupted but the kernel isn't.  The kernel keeps working 
it and eventually completes or aborts the connection attempt.  It then 
sets the data structures and values so the next time userland comes 
alive it's ready for it.  The connect() call doesn't restart at the 
beginning, it continues where it left off.

David




Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
David Ford
Date:
>
>
>This does not actually *say* that the appropriate behavior after EINTR
>is to retry, but reading between the lines one might infer that it will
>work like the nonblocking case, wherein a retry of connect tries to link
>to the existing connection attempt, not start a new one.
>
>What's more important is that a retry will expose the possibility of
>getting EALREADY or EISCONN.  EALREADY certainly must be treated as
>success the same as EINPROGRESS (if it exists on a given platform ---
>better #ifdef it I think).  Not so sure about EISCONN; does that imply
>"you moron, this socket's been open forever", or does it get returned on
>the first iteration that doesn't return EALREADY?
>

Not to worry.  EINPROGRESS, etc, are normally used for select() and 
return the current state of the connection attempt.  connect() in 
blocking mode will normally only return when it is interrupted or 
complete.  In non-blocking mode, it will return immediately with errno 
set to EALREADY, in which case you should spin on the socket connection 
attempt and wait until it returns a good status, i.e. return value=0, or 
if you happen to make a mistake and drop that iteration, EISCONN.

Those of you who are interested in socket operations would probably 
enjoy reading TCP/IP Illustrated, I believe the most recent version is 
#3, sorry I don't have the ISBN handy but most libraries should have v2.

As to your last concern Tom, the cycle should be: return=-1 [repeats 
until connection fails or succeeds], return=0 on success, or -1 on 
failure w/ errno set to appropriate fault, and afterwards, either return 
-1 with errno=EISCONN (already connected with prior success), or 
appropriate fault.

The important part is that if the socket successfully connects, it will 
return a 0.  That should be the last iteration of your spin on that 
connect call.  If you have a mistake in your code and you keep spinning 
on a successfully connected socket, yes you will get EISCONN, i.e. 
"hello..the socket is still connected" :)

Thus a simplified loop should look like this:

do {   int r;
   r=connect(...);   if(!r)    /* we're connected, r=0.  exit the loop */       break;     switch(errno) {    /*
connectreturns 0 or -1, so this must be -1 */       case EINTR:    /* we were interrupted by a signal */
continue;      default:           error_log(logfd, "Error connecting, kernel said: %s", 
 
strerror(errno));           r=-2;    /* jump out of the loop */           break;       }
   } while(r==-1):

This will continue trying the connect call as long as it hasn't returned 
a failure message and the errno is EINTR.  All other -1 return values 
will exit the loop.  As soon as it has connected properly it will also 
exit the loop.

David




Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Peter Eisentraut
Date:
David Ford writes:

> Libpq doesn't deal with system calls being interrupted in the slightest.
>  None of the read/write or socket calls handle any errors.  Even benign
> returns i.e. EINTR are treated as fatal errors and returned.  Not to
> malign, but there is no reason not to continue on and handle EINTR.

Libpq certainly does deal with system calls being interrupted:  It does
not allow them to be interrupted.  Take a look into the file pqsignal.c to
see why.

If your alarm timer interrupts system calls then that's because you have
installed your signal handler to allow that.  In my mind, a reasonable
behaviour in that case would be to let the PQconnect or equivalent fail
and provide the errno to the application.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Tom Lane
Date:
David Ford <david@blue-labs.org> writes:
> Thus a simplified loop should look like this:

No, it should *not* look like that.  The fe-connect.c code is designed
to move on as soon as it's convinced that the kernel has accepted the
connection request.  We use a non-blocking connect() call and later
wait for connection complete by probing the select() status.  Looping
on the connect() itself would be a busy-wait, which would be antisocial.

Certainly we can move on as soon as we see EINPROGRESS or EALREADY.
What I'm wondering at the moment is whether we could take EINTR to be
also an indication that the kernel has accepted the connection request,
and proceed forward to the select(), rather than trying the connect()
again.

Although Brent didn't say so very clearly, I think his real concern is
"why are you getting EINTR from a non-blocking connect, and what does
that mean anyway?"  The HPUX man page certainly makes it sound like
this is not distinct from the EINPROGRESS return, and that there's
no need to retry the connect() per se.
        regards, tom lane


Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Libpq certainly does deal with system calls being interrupted:  It does
> not allow them to be interrupted.  Take a look into the file pqsignal.c to
> see why.

???  Are you momentarily confusing backend and frontend libpq?

AFAICT the client-side libpq doesn't (and shouldn't) touch signal
handling at all, except for a couple of places in the print routines
that temporarily block SIGPIPE.

Since we deal happily with EINTR for most of the frontend socket calls,
I don't see a reason not to cope with it for connect() too.  I am
somewhat concerned about what exactly it means for a non-blocking
connect, however.  Maybe it doesn't mean anything, and we could treat
it the same as EINPROGRESS.
        regards, tom lane


Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
David Ford
Date:
"The **SA_RESTART**  flag is always set by the underlying system in 
POSIX mode so that interrupted system calls will fail with return value 
of -1 and the *EINTR* error in /errno/ instead of getting restarted." libpq's pqsignal.c doesn't turn off SA_RESTART
forSIGALRM.  Further, 
 
pqsignal.c only handles SIGPIPE and not to mention that other parts of 
libpq do handle EINTR properly.

PQconnect* family does not handle EINTR.  It does not handle the 
possible and perfectly legitimate interruption of a system call. Globally trying to disable system calls from being
interruptedis a Bad 
 
Thing. Having a timer event is common, having a timer event in a daemon 
is often required.  Timers allow for good housekeeping and playing nice 
with the rest of the system.

Your reasonable behavior in the case of EINTR means repeatable and 
mysterious failure.  There isn't a clean way to re-enter PQconnect* 
while maintaining state in the case of signal interruption and no 
guarantee the function won't be interrupted again.

Basically if you have a timer event in your software and you use pgsql, 
then the following will happen.

a) if the timer event always happens outside the PQconnect* call is 
completed your code will function
b) if the timer event always fires during the PQconnect* call, your code 
will never function
c) if your timer event sometimes fires during the PQconnect* call, your 
code will sometimes function

There are no ifs, ands, or buts about it, if a timer fires inside 
PQconnect* as it is now, there is no way to continue. With a suitablly 
long timer period, you can try the PQconnect* call again and if the 
connect succeeds before the timer fires again you're fine.  If not, you 
must repeatedly try.

That said, there are two ways about it.  a) handle it cleanly inside 
PQconnect* like it should be done, or b) have the programmer parse the 
error string for "Interrupted system call" and re-enter PQconnect.  a) 
is clean, short, and simple.  b) wastes a lot of CPU to attempt to 
accomplish the task.  a) is guaranteed and b) is not guaranteed.

David

Peter Eisentraut wrote:

David Ford writes:

>Libpq doesn't deal with system calls being interrupted in the slightest.
> None of the read/write or socket calls handle any errors.  Even benign
>returns i.e. EINTR are treated as fatal errors and returned.  Not to
>malign, but there is no reason not to continue on and handle EINTR.
>

Libpq certainly does deal with system calls being interrupted:  It does
not allow them to be interrupted.  Take a look into the file pqsignal.c to
see why.

If your alarm timer interrupts system calls then that's because you have
installed your signal handler to allow that.  In my mind, a reasonable
behaviour in that case would be to let the PQconnect or equivalent fail
and provide the errno to the application.




Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
David Ford
Date:
>
>
>
>No, it should *not* look like that.  The fe-connect.c code is designed
>to move on as soon as it's convinced that the kernel has accepted the
>connection request.  We use a non-blocking connect() call and later
>wait for connection complete by probing the select() status.  Looping
>on the connect() itself would be a busy-wait, which would be antisocial.
>

The fe-connect.c code moves on regardless of the completion of the 
connect() if it has been interrupted.

To simplify, in a program without SIGALRM events, PQconnect* won't be 
interrupted.  The connect() call will complete properly.

In a program with SIGALRM events, the call is interrupted inside 
connect().  If SA_RESTART was disabled for connect() in POSIX semantics, 
the  program would automatically jump right back into the connect() 
call.  However by default POSIX code enables SA_RESTART which for 
SIGALRM means -don't- automatically restart the system call.  This means 
the programmer needs to check for -1/errno=EINTR and jump back into 
connect() himself.  There isn't a concern for busy wait/anti social code 
behavior, your program was in the middle of connect() when it was 
interrupted, you're simply jumping back to where you left off.

It doesn't matter if it is a blocking connect or non-blocking connect, 
handling EINTR must be done if SIGALRM events are employed.  A fast 
enough event timer with a non-blocking connect will also be susceptible 
to EINTR.

EINTR is distinctly different from EINPROGRESS.  If they were the same 
then there would be a problem.  EINTR should be handled by jumping back 
into the connect() call, it is re-entrant and designed for this.

Regardless, you don't wait for the connection to complete, the code 
following the connect() call returns failure for every -1 result from 
connect() unless it is EINPROGRESS or EWOULDBLOCK.  select() is -not- 
used in fe-connect.c.  It is possible with the current code for the 
connection to fail in non-blocking mode.  Reason: you call connect() in 
non-blocking mode, break out of the section on EINPROGRESS, and continue 
assuming that the connection will be successful.
      EINPROGRESS             The  socket is non-blocking and the connection can             not be completed
immediately. It  is  possible  to             select(2)  or  poll(2)  for completion by selecting             the
socket for  writing.  After  select  indicates             writability, use getsockopt(2) to read the SO_ERROR
  option at level  SOL_SOCKET  to  determine  whether             connect  completed  successfully (SO_ERROR is zero)
         or unsuccessfully (SO_ERROR is  one  of  the  usual             error  codes listed here, explaining the
reasonfor             the failure).
 

The socket is not checked any further after the connect().  The code 
should not continue on into the SSL handling until you're sure that the 
socket is ready for operation.

The reason why I am getting EINTR from a non-blocking connect is because 
my event timer happens to fire in the middle of the connect() call. Just because you set the socket to FIONBIO doesn't
meanthat connect() 
 
can't be interrupted.

David




Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Tom Lane
Date:
David Ford <david@blue-labs.org> writes:
> [ much ]

I think you are missing the point.  I am not saying that we shouldn't
deal with EINTR; rather I am raising what I think is a legitimate
question: *what* is the most appropriate response?  My reading of
HP's gloss suggests that we could treat EINTR the same as EINPROGRESS,
ie, consider the connect() to have succeeded and move on to the
wait-for-connection-complete-or-failure-using-select() phase.
If that works I would prefer it to repeating the connect() call,
primarily because it avoids any possibility of introducing an
infinite loop.

For PQrequestCancel we clearly do need to retry the connect(), since
that use of connect() isn't nonblocking.  But I'm not convinced that
we should do so in the main-line connection code.

> It is possible with the current code for the 
> connection to fail in non-blocking mode.  Reason: you call connect() in 
> non-blocking mode, break out of the section on EINPROGRESS, and continue 
> assuming that the connection will be successful.

No, we don't.  If you think that, then you haven't studied the code
sufficiently to be submitting patches for it.  What we actually do
is exactly what is recommended by the manpage you're quoting at me.
It's just split across multiple routines.
        regards, tom lane


Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Tom Lane
Date:
Actually, now that I look at this another time, there's an interesting
question to ask: have you compiled with USE_SSL?

The USE_SSL case definitely is broken, since it invokes the connect()
in blocking mode, but fails to retry on EINTR, which it clearly should
do in that mode.  (What's even worse is that you cannot suppress
the problem by setting allow_ssl_try false.  If you compiled with SSL
support enabled then you lose anyway.)

I think we can fix this by moving the SSL startup code to a saner place,
namely in PQconnectPoll just before it's about to send the startup
packet.  There's no reason why we shouldn't *always* do the connect()
in nonblock mode.  We could switch the socket back to blocking mode
while invoking the SSL negotiation phase (which'd be skipped if not
allow_ssl_try, so that a library compiled with USE_SSL isn't ipso
facto broken for people who want non-SSL nonblocking connect).

If you are testing with USE_SSL then that explains why you are seeing
EINTR failures.  If not, then we still have to ask whether EINTR really
needs to be handled differently from EINPROGRESS.
        regards, tom lane


Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
David Ford
Date:
>
>
>
>I think you are missing the point.  I am not saying that we shouldn't
>deal with EINTR; rather I am raising what I think is a legitimate
>question: *what* is the most appropriate response?  My reading of
>HP's gloss suggests that we could treat EINTR the same as EINPROGRESS,
>ie, consider the connect() to have succeeded and move on to the
>wait-for-connection-complete-or-failure-using-select() phase.
>If that works I would prefer it to repeating the connect() call,
>primarily because it avoids any possibility of introducing an
>infinite loop.
>

You wouldn't get an infinite loop, you'd get Exx indicating the 
operation was in progress.  Yes,  you could spin on a select() waiting 
for the end result.  What I normally do is this:

connect()

while(select()) {   switch () {       case EINTR:           break;       case EINPROGRESS:           nanosleep();
   break;       case ETIMEDOUT:       default:           /* handle timeout and other error conditions nicely for the 
 
user */           break;   }
}

With EINTR, it's fine to immediately start working again because your 
code was interrupted from outside this scope.  We don't know where in 
connect() we were interrupted, blocking or non-blocking.  With 
EINPROGRESS I sleep for a while to be nice to the system.  Here we know 
that things are moving along like they should be and we are in a proper 
sleepable period.

That isn't to imply that things will break if we sleep from EINTR.  Only 
that connect() exited due to an interruption, not due to planning.

>
>No, we don't.  If you think that, then you haven't studied the code
>sufficiently to be submitting patches for it.  What we actually do
>is exactly what is recommended by the manpage you're quoting at me.
>It's just split across multiple routines.
>
I traced several calls and they run through a few functions which end 
up in pqFlush.  These code paths haven't checked the socket to see if it 
is ready for RW operation yet.  pqFlush calls send() [ignoring SSL].

Only after a lot of code has been traversed is pqWait run in which the 
socket is checked for RW and EINTR.  My point that I was bringing up 
with Peter was that it's much more nice on the system to wait for the 
socket to become usable before going through all that code.  In the 
previous email I suggested that with a sufficiently fast timer event, 
you'd never get back through the PQconnect* before being interrupted 
again and that's why I advocate putting the EINTR as close to the 
connect() as possible.  Tying this together is why it is possible to 
fail, a good amount of code is traversed before you get back to dealing 
with the socket.  Anywhere inbetween this signal events can happen again.

That's what provoked this original patch.  Unless I shut off my timer or 
changed my timer to happen in the long distant future, I would never 
have a successful connection established.

David




Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Tom Lane
Date:
David Ford <david@blue-labs.org> writes:
>  I traced several calls and they run through a few functions which end 
> up in pqFlush.  These code paths haven't checked the socket to see if it 
> is ready for RW operation yet.  pqFlush calls send() [ignoring SSL].

Where?  AFAICS (ignoring the USE_SSL breakage), connectDBStart will
return immediately after calling connect(), and the next thing
that's done is pqWait from connectDBComplete.  If there's a path that
does what you claim, that's a bug ... but I don't see it.
        regards, tom lane


Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Peter Eisentraut
Date:
Tom Lane writes:

> AFAICT the client-side libpq doesn't (and shouldn't) touch signal
> handling at all, except for a couple of places in the print routines
> that temporarily block SIGPIPE.

Which was my point.

> Since we deal happily with EINTR for most of the frontend socket calls,
> I don't see a reason not to cope with it for connect() too.  I am
> somewhat concerned about what exactly it means for a non-blocking
> connect, however.  Maybe it doesn't mean anything, and we could treat
> it the same as EINPROGRESS.

I feel that if the user installed his signal handlers to interrupt system
calls then he probably had a reason for it, possibly because of the timing
aspects of his application.  Thus, it shouldn't be libpq's task to
override that decision.  If the user doesn't want system calls to be
interrupted, then he should install the signal handlers in the proper way.
If he doesn't know how to do that, he needs to educate himself, that's
all.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
David Ford
Date:
Peter Eisentraut wrote:

>>AFAICT the client-side libpq doesn't (and shouldn't) touch signal
>>handling at all, except for a couple of places in the print routines
>>that temporarily block SIGPIPE.
>>
>
>Which was my point.
>

My patch doesn't affect signal handling, my patch affects the response 
of connect() after it is interrupted by a signal.

>>Since we deal happily with EINTR for most of the frontend socket calls,
>>I don't see a reason not to cope with it for connect() too.  I am
>>somewhat concerned about what exactly it means for a non-blocking
>>connect, however.  Maybe it doesn't mean anything, and we could treat
>>it the same as EINPROGRESS.
>>
>
>I feel that if the user installed his signal handlers to interrupt system
>calls then he probably had a reason for it, possibly because of the timing
>aspects of his application.  Thus, it shouldn't be libpq's task to
>override that decision.  If the user doesn't want system calls to be
>interrupted, then he should install the signal handlers in the proper way.
>If he doesn't know how to do that, he needs to educate himself, that's
>all.
>

Let me ask you how you would handle SIGALRM without interrupting 
syscalls?  Further, how would you guarantee your SIGALRM handler would 
execute within your granular limits?

Yes, the user has a timing aspect.  Libpq isn't overriding it and my 
patch doesn't change that.  My patch adds the EINTR handling.  Currently 
there is no way for libpq to continue processing a connect call if that 
syscall is interrupted and the user is using POSIX defaults.  Please 
refer to the POSIX specification, I povided a quote in a previous message.

POSIX default signal handling sets SA_RESTART which -disables- system 
call restart.

At present, the PQconnect* function treats EINTR as a fatal error.

David




Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I feel that if the user installed his signal handlers to interrupt system
> calls then he probably had a reason for it, possibly because of the timing
> aspects of his application.  Thus, it shouldn't be libpq's task to
> override that decision.

How are we "overriding" it?  A more correct description would be that
we are "coping" with it.  We already do so when a send or recv is
interrupted, so I don't see why there's a problem with extending that
policy to connect calls.

What I think you are arguing is that marking signal handlers SA_RESTART
is a sufficient answer for this issue, but I don't really agree, on
two grounds: (a) not everyone has POSIX signals, (b) SA_RESTART is a
blunt instrument because it's per-signal.  A user might well want
SIGALRM to interrupt some operations, but that doesn't mean he wants it
to cause failures inside subroutine libraries.  Look at the backend:
we make SIGALRM non-SA_RESTART, which means we need to retry after
EINTR in most places.  We do it because we want certain specific waits
to be interruptible, not because we want a global policy of "fail if
interrupted".  (Now that I look at it, I wonder whether SIGINT,
SIGTERM, SIGQUIT shouldn't be non-SA_RESTART as well, but that's a
different discussion.)

My quibble with David has been about whether the fix is correct in
detail, not about whether its purpose is correct.
        regards, tom lane


Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
David Ford
Date:
Tom Lane wrote:

>My quibble with David has been about whether the fix is correct in
>detail, not about whether its purpose is correct.
>

My preference is to spin until ! EINTR as soon as possible after the 
socket operation.  I'm going to seek some advice from kernel people 
regarding the inner workings of sockets.  I suspect that we'll be able 
to get away with handling it like as is done with EINPROGRESS, but 
something is nagging me at the back of my head.

In regards to the other signals, I think there is another subtle 
problem.  Remember a month ago when I had a huge database that I had to 
upgrade, I had no disk space to export it to and when the old version of 
psql ran out of memory it crashed?  The backend continued to push query 
data out the closed pipe until the backend was forcibly closed or the 
query completed.  Naturally this  caused considerable spammage on the 
console.

So I suspect some tuning needs to be done in respect to SIGPIPE w/ the 
backend.

David




Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Tom Lane
Date:
David Ford <david@blue-labs.org> writes:
> problem.  Remember a month ago when I had a huge database that I had to 
> upgrade, I had no disk space to export it to and when the old version of 
> psql ran out of memory it crashed?  The backend continued to push query 
> data out the closed pipe until the backend was forcibly closed or the 
> query completed.  Naturally this  caused considerable spammage on the 
> console.

This has been discussed before.  Don't bother proposing that backends
should use the default handling of SIGPIPE, because that won't be
accepted.  A safe limited solution would be to keep backend libpq from
emitting multiple consecutive "broken pipe" reports to stderr.
A better-sounding solution that might have unforeseen side effects
is to set QueryCancel as soon as we see a nonrecoverable-looking
send() error.  This is on the TODO list but no one's gotten round to
doing anything about it yet.
        regards, tom lane


Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Bruce Momjian
Date:
> David Ford <david@blue-labs.org> writes:
> > problem.  Remember a month ago when I had a huge database that I had to 
> > upgrade, I had no disk space to export it to and when the old version of 
> > psql ran out of memory it crashed?  The backend continued to push query 
> > data out the closed pipe until the backend was forcibly closed or the 
> > query completed.  Naturally this  caused considerable spammage on the 
> > console.
> 
> This has been discussed before.  Don't bother proposing that backends
> should use the default handling of SIGPIPE, because that won't be
> accepted.  A safe limited solution would be to keep backend libpq from
> emitting multiple consecutive "broken pipe" reports to stderr.
> A better-sounding solution that might have unforeseen side effects
> is to set QueryCancel as soon as we see a nonrecoverable-looking
> send() error.  This is on the TODO list but no one's gotten round to
> doing anything about it yet.

Guys, can we come to a resolution this so I can mark it as completed?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Guys, can we come to a resolution this so I can mark it as completed?

The QueryCancel idea is a neat hack but I don't have enough confidence
in it to throw it in during beta.  I'll add a little bit of code to
pq_flush to suppress duplicate error messages; that should be safe
enough and get rid of the worst aspects of the problem.
        regards, tom lane


Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Guys, can we come to a resolution this so I can mark it as completed?
> 
> The QueryCancel idea is a neat hack but I don't have enough confidence
> in it to throw it in during beta.  I'll add a little bit of code to
> pq_flush to suppress duplicate error messages; that should be safe
> enough and get rid of the worst aspects of the problem.

So I will mark it as done.  If there is something for the TODO list
here, please let me know.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026