Thread: Escaping from blocked send() reprised.

Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hello, I have received inquiries related to blocked communication
several times for these weeks with different symptoms. Then I
found this message from archive,

http://postgresql.1045698.n5.nabble.com/Escaping-a-blocked-sendto-syscall-without-causing-a-restart-td5740855.html

> Subject: Escaping a blocked sendto() syscall without causing a restart

Mr. Tom Lane gave a comment replying it,

> Offhand it looks to me like most signals would kick the backend off the
> send() call ... but it would loop right back and try again.  See
> internal_flush() in pqcomm.c.  (If you're using SSL, this diagnosis
> may or may not apply.)
> 
> We can't do anything except repeat the send attempt if the client
> connection is to be kept in a sane state.
(snipped)
>  And I'm not at all sure if we could get it to work in SSL mode...

That's true for timeouts that should continue the connection,
say, statement_timeout, but focusing on intentional backend
termination, I think it does no harm to break it up abruptly,
even if it was on SSL. On the other hand it seems still
preferable to keep a connection when not blocked. The following
expression would detects such a blocking state at just before
next send(2) after the previous try exited by signals.

(ProcDiePending && select(1, NULL, fd, NULL, '1 sec') == 0)

Finally, pg_terminate_backend() works even when send is blocked
for both SSL and non-SSL connections after 1 second delay with
this patch (break_socket_blocking_on_termination_v1.patch).

Nevetheless, of course statement_timeout cannot become effective
by this method since it breaks the consistency in the client
protocol. It needs change in client protocol to have "out of
band" mechanism or something, maybe.

Any suggestions?



Attached patches are:
 - break_socket_blocking_on_termination_v1.patch : The patch to   break blocked state of send(2) for
pg_terminate_backend().
 - socket_block_test.patch : debug printing and changing send   buffer of libpq for reproducing the blocked situation.



Some point of discussion follows,

==== Discussion about the appropriateness of looking into    ProcDiePending there and calling CHECK_FOR_INTERRUPTS()
seeingit.
 

I have somewhat uneasiness of these things, but what we can at
most seems to be replacing ProcDiePending here with some another
variable, say ImmediatelyExitFromBlockedState, and somehow go
upstairs through normal return path. Additional Try-Catch seems
can do that but it looks no benefit for the added complexity..



==== Discussion on breaking up connetion especially for SSL

Breaking an SSL connection up in my_sock_write() cause the
following message on client side if it still lives and resumes to
receive from the connection, this seems to show that the client
handles the event properly.

| SSL SYSCALL error: EOF detected
| The connection to the server was lost. Attempting reset: Succeeded.



==== Discussion on reliability of select(2)

This method is not a perfect solution, since the select(2)
sometimes gives a wrong oracle about wheather the follwing
send(2) will be blocked.

Even so, as far as I see, select(2) just after exiting from
blocked send(2) by signal seems always says 'write will be
blocked', so what this patch does seems to save most cases except
when the any amount of socket buffer is vacated just before the
following select. The second chance to exit from blocked send(2)
won't come after this(, before one more pg_terminate_backend() ?).

Removing the select(2) from the condition (that is,
CHECK_FOR_INTERRUPTS() is called always ProcDiePending is true)
prevents such a possibility, in exchange for killing 'really
live' connection but IMHO it's no problem on intentional server
termination.

More reliable measure for this would be non-blocking IO but it
changes more of the code.



====  Reproducing the situation.

Another possible question would be about the possibility of such
blocking, or how to reproduce the situation. I found that send(2)
on CentOS6.5 somehow sends successfully, for most cases, the
remaining data at the retry after exiting by signal during being
blocked with buffer full, in spite of no change in environment.

So reproducing the stucked situation is rather difficult on the
server as is. But such situation would be reproduced quite easily
with some cheat, that is, enlarging PQ send buffer, say by ten
times.

Applying the attached testing patch (socket_block_test.patch),
the following steps will make the stucked situation.
1. Do a select which returns big result and enter Ctrl-Z just   after invoking.
  cl> $ psql -h localhost postgres  cl> postgres=# select 1 from generate_series(0, 9999999);  cl> ^Z  cl> [4]+
Stopped                psql -h localhost postgres
 
2. Watch the server to stuck.
 The server starts to print lines like following to console after a while, then stops. The number enclosed by the
squarebrackets is PID of the server.
 
  sv> #### [8809] (bare) rest = 0 / 81920 bytes, ProcDiePending = 0
3. Do pg_terminate_backend().
  cl> $ psql postgres -c "select pg_terminate_backend(8809)"
  The server will stuck like follows, PID=8811 is the another  session made by the command just above.
  sv> #### [8809] (bare) rest = 0 / 81920 bytes, ProcDiePending = 0  sv> #### [8811] (bare) rest = 0 / 327 bytes,
ProcDiePending= 0  sv> #### [8809] (bare) rest = 500 / 81920 bytes, ProcDiePending = 1  sv> #### [8811] (bare) rest = 0
/78 bytes, ProcDiePending = 0
 
 The server 8809 is blocked during sending the remaining 500 bytes and won't come back forever except for SIGKILL, or
possibledata reading on the client (fg does it).
 
  cl> $ fg
  sv> #### [8809] (bare) rest = 0 / 500 bytes, ProcDiePending = 1  sv> FATAL:  terminating connection due to
administratorcommand  sv> STATEMENT:  select 1 from generate_series(0, 9999999);  sv> #### [8809] (bare) rest = 0 / 116
bytes,ProcDiePending = 0  sv> #### [8883] (bare) rest = 0 / 327 bytes, ProcDiePending = 0
 

 If you don't see the situation to occur, changing the value of select clause (by its length, not by value:) would be
effective,or entering Ctrl-Z after some debug output also would be effective.
 
 For SSL connections, the debug output looks like the following,
  sv> #### [20064] (bare) rest = 0 / 81920 bytes, ProcDiePending = 0  sv> #### [20064] (SSL) rest = 0 / 16413 bytes,
ProcDiePending= 0  sv> #### [20064] (SSL) rest = 0 / 16413 bytes, ProcDiePending = 0  sv> #### [20064] (SSL) rest = 0 /
16413bytes, ProcDiePending = 0  sv> #### [20064] (SSL) rest = 980 / 16413 bytes, ProcDiePending = 1  sv> #### [20064]
(SSL)rest = 0 / 980 bytes, ProcDiePending = 1  sv> #### [20064] (SSL) rest = 1029 / 16413 bytes, ProcDiePending = 1
 
 "bare" here in turn means the status of SSL_write and "SSL" means the status of the underlying 'bare' socket of SSL
connection.(Sorry for the confising labelings..)
 
 The (bare) line above is not corresponding to the following bunch of (SSL) lines, but its precedents. At the fifth
line,send(2) exits by signal issued by pg_teminate_backend() then retry (somehow) successfully at sixth line but SSL
layergave another 16413 bytes and only 1029 bytes of that is sent by the first try and the server stucked at the second
tryfor the seventh line. The control doesn't come back to secure_write() during this sequence.
 


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 59204cf..f01c140 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -76,6 +76,7 @@#include "libpq/libpq.h"#include "tcop/tcopprot.h"
+#include "miscadmin.h"#include "utils/memutils.h"
@@ -323,6 +324,41 @@ rloop:}/*
+ * Although we basically should try to send all data staying in our send
+ * buffer, we also should consider the possibility that hanging of clients or
+ * network cutoff has compelled send(2) to be blokced. We need to be allowed
+ * to exit from send() if such blocking states last for a while during process
+ * termination. Returns true if send blocking is detected.
+ *
+ * The worse side of this would be that extra-slow receiver (specifically
+ * under one packet per second) might fail to recive the result to the end,
+ * but pg_terminate_backend() is such a thing.
+ */
+static bool
+is_write_blocked(int fd)
+{
+#ifndef WIN32
+    fd_set wfds;
+    struct timeval tv;
+    int ret;
+    int save_errno = errno;
+
+    FD_ZERO(&wfds);
+    FD_SET(fd, &wfds);
+    tv.tv_sec = 1;
+    tv.tv_usec = 0;
+    ret = select(fd + 1, NULL, &wfds, NULL, &tv);
+
+    /* The error of select here is safely ignorable. */
+    errno = save_errno;
+
+    return ret <= 0 ? true : false;
+#else
+    return false;
+#endif
+}
+
+/* *    Write data to a secure connection. */ssize_t
@@ -457,6 +493,14 @@ wloop:#endif        n = send(port->sock, ptr, len, 0);
+    /*
+     * Check for interrupt if the socket was blocked during process is
+     * being terminated.
+     */
+    if (ProcDiePending && is_write_blocked(port->sock))
+        CHECK_FOR_INTERRUPTS();
+    
+        return n;}
@@ -515,6 +559,14 @@ my_sock_write(BIO *h, const char *buf, int size)    res = send(h->num, buf, size, 0);
BIO_clear_retry_flags(h);
+
+    /*
+     * Check for interrupt if the socket was blocked during process is being
+     * terminated.
+     */
+    if (ProcDiePending && is_write_blocked(h->num))
+        CHECK_FOR_INTERRUPTS();
+    if (res <= 0)    {        if (errno == EINTR)
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 59204cf..98ee41e 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -75,6 +75,7 @@#endif   /* USE_SSL */#include "libpq/libpq.h"
+#include "miscadmin.h"#include "tcop/tcopprot.h"#include "utils/memutils.h"
@@ -457,6 +458,9 @@ wloop:#endif        n = send(port->sock, ptr, len, 0);
+    fprintf(stderr, "#### [%d] (bare) rest = %d / %d bytes, ProcDiePending = %d\n",
+            getpid(), len - n, len, ProcDiePending);
+    return n;}
@@ -514,6 +518,8 @@ my_sock_write(BIO *h, const char *buf, int size)    int            res = 0;    res = send(h->num,
buf,size, 0);
 
+    fprintf(stderr, "#### [%d] (SSL) rest = %d / %d bytes, ProcDiePending = %d\n",
+            getpid(), size - res, size, ProcDiePending);    BIO_clear_retry_flags(h);    if (res <= 0)    {
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..259b6ac 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -114,7 +114,7 @@ static List *sock_paths = NIL; * enlarged by pq_putmessage_noblock() if the message doesn't fit
otherwise.*/
 
-#define PQ_SEND_BUFFER_SIZE 8192
+#define PQ_SEND_BUFFER_SIZE (8192 * 10)#define PQ_RECV_BUFFER_SIZE 8192static char *PqSendBuffer;

Re: Escaping from blocked send() reprised.

From
Robert Haas
Date:
On Mon, Jun 30, 2014 at 4:13 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello, I have received inquiries related to blocked communication
> several times for these weeks with different symptoms. Then I
> found this message from archive,
>
> http://postgresql.1045698.n5.nabble.com/Escaping-a-blocked-sendto-syscall-without-causing-a-restart-td5740855.html
>
>> Subject: Escaping a blocked sendto() syscall without causing a restart
>
> Mr. Tom Lane gave a comment replying it,
>
>> Offhand it looks to me like most signals would kick the backend off the
>> send() call ... but it would loop right back and try again.  See
>> internal_flush() in pqcomm.c.  (If you're using SSL, this diagnosis
>> may or may not apply.)
>>
>> We can't do anything except repeat the send attempt if the client
>> connection is to be kept in a sane state.
> (snipped)
>>  And I'm not at all sure if we could get it to work in SSL mode...
>
> That's true for timeouts that should continue the connection,
> say, statement_timeout, but focusing on intentional backend
> termination, I think it does no harm to break it up abruptly,
> even if it was on SSL. On the other hand it seems still
> preferable to keep a connection when not blocked. The following
> expression would detects such a blocking state at just before
> next send(2) after the previous try exited by signals.
>
> (ProcDiePending && select(1, NULL, fd, NULL, '1 sec') == 0)
>
> Finally, pg_terminate_backend() works even when send is blocked
> for both SSL and non-SSL connections after 1 second delay with
> this patch (break_socket_blocking_on_termination_v1.patch).
>
> Nevetheless, of course statement_timeout cannot become effective
> by this method since it breaks the consistency in the client
> protocol. It needs change in client protocol to have "out of
> band" mechanism or something, maybe.
>
> Any suggestions?

You should probably add your patch here, so it doesn't get forgotten about:

https://commitfest.postgresql.org/action/commitfest_view/open

We're focused on reviewing patches for the current CommitFest, so your
patch might not get attention right away.  A couple of general
thoughts on this topic:

1. I think it's the case that there are platforms around where a
signal won't cause send() to return EINTR.... and I'd be entirely
unsurprised if SSL_write() doesn't necessarily return EINTR in that
case.  I'm not sure what, if anything, we can do about that.

2. I think it would be reasonable to try to kill off the connection
without notifying the client if we're unable to send the data to the
client in a reasonable period of time.  But I'm unsure what "a
reasonable period of time" means.  This patch would basically do it
after no delay at all, which seems like it might be too aggressive.
However, I'm not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hello, The replies follow are mainly as a memo for myself so
please don't be bothered to answer until the time comes.

At Mon, 30 Jun 2014 11:27:47 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=Wc8PnQ@mail.gmail.com>
> You should probably add your patch here, so it doesn't get forgotten about:
> 
> https://commitfest.postgresql.org/action/commitfest_view/open

Ok, I'll put this there. 

> We're focused on reviewing patches for the current CommitFest, so your
> patch might not get attention right away.  A couple of general
> thoughts on this topic:

Thank you for suggestions. I'll consider on them.

> 1. I think it's the case that there are platforms around where a
> signal won't cause send() to return EINTR.... and I'd be entirely
> unsurprised if SSL_write() doesn't necessarily return EINTR in that
> case.  I'm not sure what, if anything, we can do about that.

man 2 send on FreeBSD has not description about EINTR.. And even
on linux, send won't return EINTR for most cases, at least I
haven't seen that. So send()=-1,EINTR seems to me as only an
equivalent of send() = 0. I have no idea about what the
implementer thought the difference is.


> 2. I think it would be reasonable to try to kill off the connection
> without notifying the client if we're unable to send the data to the
> client in a reasonable period of time.  But I'm unsure what "a
> reasonable period of time" means.  This patch would basically do it
> after no delay at all, which seems like it might be too aggressive.
> However, I'm not sure.

I think there's no such a reasonable time. The behavior might
should be determined from another point.. On alternative would be
let pg_terminate_backend() have a parameter instructing force
shutodwn (how to propagate it?), or make a forced shutdown on
duplicate invocation of pg_terminate_backend().

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Escaping from blocked send() reprised.

From
Robert Haas
Date:
On Mon, Jun 30, 2014 at 11:26 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> 2. I think it would be reasonable to try to kill off the connection
>> without notifying the client if we're unable to send the data to the
>> client in a reasonable period of time.  But I'm unsure what "a
>> reasonable period of time" means.  This patch would basically do it
>> after no delay at all, which seems like it might be too aggressive.
>> However, I'm not sure.
>
> I think there's no such a reasonable time. The behavior might
> should be determined from another point.. On alternative would be
> let pg_terminate_backend() have a parameter instructing force
> shutodwn (how to propagate it?), or make a forced shutdown on
> duplicate invocation of pg_terminate_backend().

Well, I think that when people call pg_terminate_backend() just once,
they expect it to kill the target backend.  I think people will
tolerate a short delay, like a few seconds; after all, there's no
guarantee, even today, that the backend will hit a
CHECK_FOR_INTERRUPTS() in less than a few hundred milliseconds.  But
they are not going to want to have to take a second action to kill the
backend - killing it once should be sufficient.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Escaping from blocked send() reprised.

From
Martijn van Oosterhout
Date:
On Tue, Jul 01, 2014 at 12:26:43PM +0900, Kyotaro HORIGUCHI wrote:
> > 1. I think it's the case that there are platforms around where a
> > signal won't cause send() to return EINTR.... and I'd be entirely
> > unsurprised if SSL_write() doesn't necessarily return EINTR in that
> > case.  I'm not sure what, if anything, we can do about that.
>
> man 2 send on FreeBSD has not description about EINTR.. And even
> on linux, send won't return EINTR for most cases, at least I
> haven't seen that. So send()=-1,EINTR seems to me as only an
> equivalent of send() = 0. I have no idea about what the
> implementer thought the difference is.

Whether send() returns EINTR or not depends on whether the signal has
been marked restartable or not. This is configurable per signal, see
sigaction(). If the signal is marked to restart, the kernel returns
ERESTARTHAND (IIRC) and the libc will redo the call internally.

Default BSD does not return EINTR normally, but supports sigaction().

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hello,

At Tue, 1 Jul 2014 21:21:38 +0200, Martijn van Oosterhout <kleptog@svana.org> wrote in
<20140701192138.GB20140@svana.org>
> On Tue, Jul 01, 2014 at 12:26:43PM +0900, Kyotaro HORIGUCHI wrote:
> > > 1. I think it's the case that there are platforms around where a
> > > signal won't cause send() to return EINTR.... and I'd be entirely
> > > unsurprised if SSL_write() doesn't necessarily return EINTR in that
> > > case.  I'm not sure what, if anything, we can do about that.
> > 
> > man 2 send on FreeBSD has not description about EINTR.. And even
> > on linux, send won't return EINTR for most cases, at least I
> > haven't seen that. So send()=-1,EINTR seems to me as only an
> > equivalent of send() = 0. I have no idea about what the
> > implementer thought the difference is.
> 
> Whether send() returns EINTR or not depends on whether the signal has
> been marked restartable or not. This is configurable per signal, see
> sigaction(). If the signal is marked to restart, the kernel returns
> ERESTARTHAND (IIRC) and the libc will redo the call internally.

Wow, thank you for detailed information. I'll study that and take
it into future discussion.

> Default BSD does not return EINTR normally, but supports sigaction().

I guess it is for easiness-to-keep-compatibility, seems
reasonable.


have a nice day,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hello, thank you for keeping this discussion moving.

> > I think there's no such a reasonable time. The behavior might
> > should be determined from another point.. On alternative would be
> > let pg_terminate_backend() have a parameter instructing force
> > shutodwn (how to propagate it?), or make a forced shutdown on
> > duplicate invocation of pg_terminate_backend().
> 
> Well, I think that when people call pg_terminate_backend() just once,
> they expect it to kill the target backend.  I think people will
> tolerate a short delay, like a few seconds; after all, there's no
> guarantee, even today, that the backend will hit a
> CHECK_FOR_INTERRUPTS() in less than a few hundred milliseconds.

Sure.

> But they are not going to want to have to take a second action
> to kill the backend - killing it once should be sufficient.

Hmm, it sounds persuasive. Well, do you think they tolerate
-force option? (Even though its technical practicality is not
clear)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Escaping from blocked send() reprised.

From
Heikki Linnakangas
Date:
On 07/01/2014 06:26 AM, Kyotaro HORIGUCHI wrote:
> At Mon, 30 Jun 2014 11:27:47 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=Wc8PnQ@mail.gmail.com>
>> 1. I think it's the case that there are platforms around where a
>> signal won't cause send() to return EINTR.... and I'd be entirely
>> unsurprised if SSL_write() doesn't necessarily return EINTR in that
>> case.  I'm not sure what, if anything, we can do about that.

We use a custom "write" routine with SSL_write, where we call send() 
ourselves, so that's not a problem as long as we put the check in the 
right place (in secure_raw_write(), after my recent SSL refactoring - 
the patch needs to be rebased).

> man 2 send on FreeBSD has not description about EINTR.. And even
> on linux, send won't return EINTR for most cases, at least I
> haven't seen that. So send()=-1,EINTR seems to me as only an
> equivalent of send() = 0. I have no idea about what the
> implementer thought the difference is.

As the patch stands, there's a race condition: if the SIGTERM arrives 
*before* the send() call, the send() won't return EINTR anyway. So 
there's a chance that you still block. Calling pq_terminate_backend() 
again will dislodge it (assuming send() returns with EINTR on signal), 
but I don't think we want to define the behavior as "usually, 
pq_terminate_backend() will kill a backend that's blocked on sending to 
the client, but sometimes you have to call it twice (or more!) to really 
kill it".

A more robust way is to set ImmediateInterruptOK before calling send(). 
That wouldn't let you send data that can be sent without blocking 
though. For that, you could put the socket to non-blocking mode, and 
sleep with select(), also waiting for the process' latch at the same 
time (die() sets the latch, so that will wake up the select() if a 
termination request arrives).

Is it actually safe to process the die-interrupt where send() is called? 
ProcessInterrupts() does "ereport(FATAL, ...)", which will attempt to 
send a message to the client. If that happens in the middle of 
constructing some other message, that will violate the protocol.

>> 2. I think it would be reasonable to try to kill off the connection
>> without notifying the client if we're unable to send the data to the
>> client in a reasonable period of time.  But I'm unsure what "a
>> reasonable period of time" means.  This patch would basically do it
>> after no delay at all, which seems like it might be too aggressive.
>> However, I'm not sure.
>
> I think there's no such a reasonable time.

I agree it's pretty hard to define any reasonable timeout here. I think 
it would be fine to just cut the connection; even if you don't block 
while sending, you'll probably reach a CHECK_FOR_INTERRUPT() somewhere 
higher in the stack and kill the connection almost as abruptly anyway. 
(you can't violate the protocol, however)

- Heikki




Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Sorry, I was absorbed by other tasks..

Thank you for reviewing thiis.

> On 07/01/2014 06:26 AM, Kyotaro HORIGUCHI wrote:
> > At Mon, 30 Jun 2014 11:27:47 -0400, Robert Haas
> > <robertmhaas@gmail.com> wrote in
> > <CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=Wc8PnQ@mail.gmail.com>
> >> 1. I think it's the case that there are platforms around where a
> >> signal won't cause send() to return EINTR.... and I'd be entirely
> >> unsurprised if SSL_write() doesn't necessarily return EINTR in that
> >> case.  I'm not sure what, if anything, we can do about that.
> 
> We use a custom "write" routine with SSL_write, where we call send()
> ourselves, so that's not a problem as long as we put the check in the
> right place (in secure_raw_write(), after my recent SSL refactoring -
> the patch needs to be rebased).
> 
> > man 2 send on FreeBSD has not description about EINTR.. And even
> > on linux, send won't return EINTR for most cases, at least I
> > haven't seen that. So send()=-1,EINTR seems to me as only an
> > equivalent of send() = 0. I have no idea about what the
> > implementer thought the difference is.
> 
> As the patch stands, there's a race condition: if the SIGTERM arrives
> *before* the send() call, the send() won't return EINTR anyway. So
> there's a chance that you still block. Calling pq_terminate_backend()
> again will dislodge it (assuming send() returns with EINTR on signal),

Yes, that window would'nt be extinguished without introducing
something more. EINTR is set only when nothing sent by the
call. So AFAIS the chance of getting EINTR is far small than
expectation.

> but I don't think we want to define the behavior as "usually,
> pq_terminate_backend() will kill a backend that's blocked on sending
> to the client, but sometimes you have to call it twice (or more!) to
> really kill it".

I agree that it is desirable behavior, if any measure to avoid
that. But I think it's better than doing kill -9 engulfing all
innocent backends.

> A more robust way is to set ImmediateInterruptOK before calling
> send(). That wouldn't let you send data that can be sent without
> blocking though. For that, you could put the socket to non-blocking
> mode, and sleep with select(), also waiting for the process' latch at
> the same time (die() sets the latch, so that will wake up the select()
> if a termination request arrives).

I condiered it but select() frequently (rather in most cases when
send() blocks by send buffer exhaustion) fails to predict that
following send() will be blocked. (If my memory is correct.)  So
the final problem would be blocked send()...

> Is it actually safe to process the die-interrupt where send() is
> called? ProcessInterrupts() does "ereport(FATAL, ...)", which will
> attempt to send a message to the client. If that happens in the middle
> of constructing some other message, that will violate the protocol.

So I strongly agree to you if select() works as the impression
when reading the man document.

> >> 2. I think it would be reasonable to try to kill off the connection
> >> without notifying the client if we're unable to send the data to the
> >> client in a reasonable period of time.  But I'm unsure what "a
> >> reasonable period of time" means.  This patch would basically do it
> >> after no delay at all, which seems like it might be too aggressive.
> >> However, I'm not sure.
> >
> > I think there's no such a reasonable time.
> 
> I agree it's pretty hard to define any reasonable timeout here. I
> think it would be fine to just cut the connection; even if you don't
> block while sending, you'll probably reach a CHECK_FOR_INTERRUPT()
> somewhere higher in the stack and kill the connection almost as
> abruptly anyway. (you can't violate the protocol, however)

Yes, closing the blocked connection seems one of the most smarter
way, checking the occurred interrupt could avoid protocol
violation. But the problem for that is that there seems no means
to close sockets elsewhere the blocking handle. dup(2)'ed handle
cannot release the resource by only itself.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Escaping from blocked send() reprised.

From
Heikki Linnakangas
Date:
On 08/26/2014 09:17 AM, Kyotaro HORIGUCHI wrote:
>> but I don't think we want to define the behavior as "usually,
>> pq_terminate_backend() will kill a backend that's blocked on sending
>> to the client, but sometimes you have to call it twice (or more!) to
>> really kill it".
>
> I agree that it is desirable behavior, if any measure to avoid
> that. But I think it's better than doing kill -9 engulfing all
> innocent backends.
>
>> A more robust way is to set ImmediateInterruptOK before calling
>> send(). That wouldn't let you send data that can be sent without
>> blocking though. For that, you could put the socket to non-blocking
>> mode, and sleep with select(), also waiting for the process' latch at
>> the same time (die() sets the latch, so that will wake up the select()
>> if a termination request arrives).
>
> I condiered it but select() frequently (rather in most cases when
> send() blocks by send buffer exhaustion) fails to predict that
> following send() will be blocked. (If my memory is correct.)  So
> the final problem would be blocked send()...

My point was to put the socket in non-blocking mode, so that send() will 
return immediately with EAGAIN instead of blocking, if the send buffer 
is full. See WalSndWriteData for how that would work, it does something 
similar.

>> Is it actually safe to process the die-interrupt where send() is
>> called? ProcessInterrupts() does "ereport(FATAL, ...)", which will
>> attempt to send a message to the client. If that happens in the middle
>> of constructing some other message, that will violate the protocol.
>
> So I strongly agree to you if select() works as the impression
> when reading the man document.

Not sure what you mean, but the above is a fatal problem with the patch 
right now, regardless of how you do the sleeping.

>>>> 2. I think it would be reasonable to try to kill off the connection
>>>> without notifying the client if we're unable to send the data to the
>>>> client in a reasonable period of time.  But I'm unsure what "a
>>>> reasonable period of time" means.  This patch would basically do it
>>>> after no delay at all, which seems like it might be too aggressive.
>>>> However, I'm not sure.
>>>
>>> I think there's no such a reasonable time.
>>
>> I agree it's pretty hard to define any reasonable timeout here. I
>> think it would be fine to just cut the connection; even if you don't
>> block while sending, you'll probably reach a CHECK_FOR_INTERRUPT()
>> somewhere higher in the stack and kill the connection almost as
>> abruptly anyway. (you can't violate the protocol, however)
>
> Yes, closing the blocked connection seems one of the most smarter
> way, checking the occurred interrupt could avoid protocol
> violation. But the problem for that is that there seems no means
> to close sockets elsewhere the blocking handle. dup(2)'ed handle
> cannot release the resource by only itself.

I didn't understand that, surely you can just close() the socket? There 
is no dup(2) involved. And we don't necessarily need to close the 
socket, we just need to avoid writing to it when we're already in the 
middle of sending a message.

I'm marking this as Waiting on Author in the commitfest app, because:
1. the protocol violation needs to be avoided one way or another, and
2. the behavior needs to be consistent so that a single 
pg_terminate_backend() is enough to always kill the connection.

- Heikki




Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hello,

> > I condiered it but select() frequently (rather in most cases when
> > send() blocks by send buffer exhaustion) fails to predict that
> > following send() will be blocked. (If my memory is correct.)  So
> > the final problem would be blocked send()...
> 
> My point was to put the socket in non-blocking mode, so that send()
> will return immediately with EAGAIN instead of blocking, if the send
> buffer is full. See WalSndWriteData for how that would work, it does
> something similar.

I confused it with what I did during writing this patch. select()
- blocking send(). Sorry for confusing the discussion. I
understand correctly what you mean and It sounds reasonable.

> >> I agree it's pretty hard to define any reasonable timeout here. I
> >> think it would be fine to just cut the connection; even if you don't
> >> block while sending, you'll probably reach a CHECK_FOR_INTERRUPT()
> >> somewhere higher in the stack and kill the connection almost as
> >> abruptly anyway. (you can't violate the protocol, however)
> >
> > Yes, closing the blocked connection seems one of the most smarter
> > way, checking the occurred interrupt could avoid protocol
> > violation. But the problem for that is that there seems no means
> > to close sockets elsewhere the blocking handle. dup(2)'ed handle
> > cannot release the resource by only itself.
> 
> I didn't understand that, surely you can just close() the socket?
> There is no dup(2) involved. And we don't necessarily need to close
> the socket, we just need to avoid writing to it when we're already in
> the middle of sending a message.

My assumption there was select() and *blocking* send(). So the
sentence cited is out of point from the first.

> I'm marking this as Waiting on Author in the commitfest app, because:
> 1. the protocol violation needs to be avoided one way or another, and
> 2. the behavior needs to be consistent so that a single
> pg_terminate_backend() is enough to always kill the connection.

Thank you for the suggestion. I think I can go forward with that
and will come up with new patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hello, sorry for the dazed reply in the previous mail.

I made revised patch for this issue.

Attached patches are following,

- 0001_Revise_socket_emulation_for_win32_backend.patch
 Revises socket emulation on win32 backend so that each socket can have its own blocking mode state.

- 0002_Allow_backend_termination_during_write_blocking.patch
 The patch to solve the issue. This patch depends on the 0001_ patch.

==========

> > I'm marking this as Waiting on Author in the commitfest app, because:
> > 1. the protocol violation needs to be avoided one way or another, and
> > 2. the behavior needs to be consistent so that a single
> > pg_terminate_backend() is enough to always kill the connection.

- Preventing protocol violation.
 To prevent protocol violation, secure_write sets ClientConnectionLost when SIGTERM detected, then internal_flush() and
ProcessInterrupts()follow the instruction.
 

- Single pg_terminate_backend surely kills the backend.
 secure_raw_write() uses non-blocking socket and a loop of select() with timeout to surely detects received
signal(SIGTERM).
 To avoid frequent switching of blocking mode, the bare socket for Port is put to non-blocking mode from the first in
StreamConnection()and blocking mode is controlled only by Port->noblock in secure_raw_read/write().
 
 To make the code mentioned above (Patch 0002) tidy, rewrite the socket emulation code for win32 backends so that each
socketcan have its own non-blocking state. (patch 0001)
 

Some concern about this patch,

- This patch allows the number of non-blocking socket to be below 64 (FD_SETSIZE) on win32 backend but it seems to be
sufficient.

- This patch introduced redundant socket emulation for win32 backend but win32 bare socket for Port is already
nonblockingas described so it donsn't seem to be a serious problem on performance. Addition to it, since I don't know
thereason why win32/socket.c provides the blocking-mode socket emulation, I decided to preserve win32/socket.c to have
blockingsocket emulation. Possibly it can be removed.
 

Any suggestions?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..c92851e 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -795,10 +795,6 @@ pq_set_nonblocking(bool nonblocking)    if (MyProcPort->noblock == nonblocking)        return;
-#ifdef WIN32
-    pgwin32_noblock = nonblocking ? 1 : 0;
-#else
-    /*     * Use COMMERROR on failure, because ERROR would try to send the error to     * the client, which might
requirechanging the mode again, leading to
 
@@ -816,7 +812,7 @@ pq_set_nonblocking(bool nonblocking)            ereport(COMMERROR,
(errmsg("couldnot set socket to blocking mode: %m")));    }
 
-#endif
+    MyProcPort->noblock = nonblocking;}
diff --git a/src/backend/port/win32/socket.c b/src/backend/port/win32/socket.c
index c981169..f0ff3e7 100644
--- a/src/backend/port/win32/socket.c
+++ b/src/backend/port/win32/socket.c
@@ -21,11 +21,8 @@ * non-blocking mode in order to be able to deliver signals, we must * specify this in a separate
flagif we actually need non-blocking * operation.
 
- *
- * This flag changes the behaviour *globally* for all socket operations,
- * so it should only be set for very short periods of time. */
-int            pgwin32_noblock = 0;
+static fd_set        nonblockset;#undef socket#undef accept
@@ -33,6 +30,7 @@ int            pgwin32_noblock = 0;#undef select#undef recv#undef send
+#undef closesocket/* * Blocking socket functions implemented so they listen on both
@@ -40,6 +38,34 @@ int            pgwin32_noblock = 0; *//*
+ * Set blocking mode for each socket
+ */
+void
+pgwin32_set_socket_nonblock(SOCKET s, int nonblock)
+{
+    if (nonblock)
+        FD_SET(s, &nonblockset);
+    else
+        FD_CLR(s, &nonblockset);
+
+    /*
+     * fd_set cannot have more than FD_SETSIZE entries. It's not likey to come
+     * close to this limit but if it goes above the limit, non blocking state
+     * of some existing sockets will be discarded.
+     */
+    if (nonblockset.fd_count >= FD_SETSIZE)
+        elog(FATAL, "Too many sockets requested to be nonblocking mode.");
+}
+
+void
+pgwin32_nonblockset_init()
+{
+    FD_ZERO(&nonblockset);
+}
+
+#define socket_is_nonblocking(s) FD_ISSET((s), &nonblockset)
+
+/* * Convert the last socket error code into errno */static void
@@ -256,6 +282,10 @@ pgwin32_socket(int af, int type, int protocol)        TranslateSocketError();        return
INVALID_SOCKET;   }
 
+
+    /* newly cerated socket should be in blocking mode  */
+    pgwin32_set_socket_nonblock(s, false);
+    errno = 0;    return s;
@@ -334,7 +364,7 @@ pgwin32_recv(SOCKET s, char *buf, int len, int f)        return -1;    }
-    if (pgwin32_noblock)
+    if (socket_is_nonblocking(s))    {        /*         * No data received, and we are in "emulated non-blocking
mode",so
 
@@ -420,7 +450,7 @@ pgwin32_send(SOCKET s, const void *buf, int len, int flags)            return -1;        }
-        if (pgwin32_noblock)
+        if (socket_is_nonblocking(s))        {            /*             * No data sent, and we are in "emulated
non-blockingmode", so
 
@@ -645,6 +675,15 @@ pgwin32_select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, c/*
+ * Unused entry in nonblockset needs to be removed when closing socket.
+ */
+int pgwin32_closesocket(SOCKET s)
+{
+    pgwin32_set_socket_nonblock(s, false);
+    return closesocket(s);
+}
+
+/* * Return win32 error string, since strerror can't * handle winsock codes */
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index c7f41a5..72e0576 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3255,22 +3255,10 @@ PgstatCollectorMain(int argc, char *argv[])            /*             * Try to receive and
processa message.  This will not block,             * since the socket is set to non-blocking mode.
 
-             *
-             * XXX On Windows, we have to force pgwin32_recv to cooperate,
-             * despite the previous use of pg_set_noblock() on the socket.
-             * This is extremely broken and should be fixed someday.             */
-#ifdef WIN32
-            pgwin32_noblock = 1;
-#endif
-            len = recv(pgStatSock, (char *) &msg,                       sizeof(PgStat_Msg), 0);
-#ifdef WIN32
-            pgwin32_noblock = 0;
-#endif
-            if (len < 0)            {                if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b190cf5..5d32de6 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -896,6 +896,10 @@ PostmasterMain(int argc, char *argv[])     */    InitializeMaxBackends();
+#ifdef WIN32    
+    pgwin32_nonblockset_init();
+#endif
+    /*     * Establish input sockets.     */
diff --git a/src/include/port/win32.h b/src/include/port/win32.h
index 550c3ec..b0df45e 100644
--- a/src/include/port/win32.h
+++ b/src/include/port/win32.h
@@ -368,6 +368,7 @@ void        pg_queue_signal(int signum);#define select(n, r, w, e, timeout) pgwin32_select(n, r, w,
e,timeout)#define recv(s, buf, len, flags) pgwin32_recv(s, buf, len, flags)#define send(s, buf, len, flags)
pgwin32_send(s,buf, len, flags)
 
+#define closesocket(s) pgwin32_closesocket(s)SOCKET        pgwin32_socket(int af, int type, int protocol);SOCKET
pgwin32_accept(SOCKET s, struct sockaddr * addr, int *addrlen);
 
@@ -375,11 +376,12 @@ int            pgwin32_connect(SOCKET s, const struct sockaddr * name, int namelen);int
pgwin32_select(int nfds, fd_set *readfs, fd_set *writefds, fd_set *exceptfds, const struct timeval * timeout);int
    pgwin32_recv(SOCKET s, char *buf, int len, int flags);int            pgwin32_send(SOCKET s, const void *buf, int
len,int flags);
 
+int            pgwin32_closesocket(SOCKET s);const char *pgwin32_socket_strerror(int err);int
pgwin32_waitforsinglesocket(SOCKETs, int what, int timeout);
 
-
-extern int    pgwin32_noblock;
+void        pgwin32_set_socket_nonblock(SOCKET s, int nonblock);
+void        pgwin32_nonblockset_init();/* in backend/port/win32/security.c */extern int    pgwin32_is_admin(void);
diff --git a/src/port/noblock.c b/src/port/noblock.c
index 1da0339..d6cc6a2 100644
--- a/src/port/noblock.c
+++ b/src/port/noblock.c
@@ -25,9 +25,18 @@ pg_set_noblock(pgsocket sock)#else    unsigned long ioctlsocket_ret = 1;
+#ifndef FRONTEND
+    /*
+     * sockets on non-frontend processes on win32 is wrapped and blocking mode
+     * is controlled there. See socket.c for the details.
+     */
+    pgwin32_set_socket_nonblock(sock, true);
+    return 1;
+#else    /* Returns non-0 on failure, while fcntl() returns -1 on failure */    return (ioctlsocket(sock, FIONBIO,
&ioctlsocket_ret)== 0);
 
-#endif
+#endif /* FRONTEND */
+#endif /* !WIN32   */}
@@ -41,10 +50,16 @@ pg_set_block(pgsocket sock)    if (flags < 0 || fcntl(sock, F_SETFL, (long) (flags & ~O_NONBLOCK)))
      return false;    return true;
 
-#else
+#else /* !WIN32   */    unsigned long ioctlsocket_ret = 0;
+#ifndef FRONTEND
+    /*  See pg_set_noblock */
+    pgwin32_set_socket_nonblock(sock, false);
+    return 1;
+#else    /* Returns non-0 on failure, while fcntl() returns -1 on failure */    return (ioctlsocket(sock, FIONBIO,
&ioctlsocket_ret)== 0);
 
-#endif
+#endif /* FRONTEND */
+#endif /* !WIN32   */}
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 41ec1ad..fbb4c47 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -34,7 +34,7 @@#include "libpq/libpq.h"#include "tcop/tcopprot.h"#include "utils/memutils.h"
-
+#include "miscadmin.h"char       *ssl_cert_file;char       *ssl_key_file;
@@ -140,6 +140,10 @@ secure_read(Port *port, void *ptr, size_t len)    return n;}
+/*
+ *  Read data from socket.
+ *  This emulates blocking behavior using non-blocking sockets.
+ */ssize_tsecure_raw_read(Port *port, void *ptr, size_t len){
@@ -147,8 +151,34 @@ secure_raw_read(Port *port, void *ptr, size_t len)    prepare_for_client_read();
-    n = recv(port->sock, ptr, len, 0);
+    if (port->noblock)
+        n = recv(port->sock, ptr, len, 0);
+    else
+    {
+        do
+        {
+            fd_set rfds;
+
+            FD_ZERO(&rfds);
+            FD_SET(port->sock, &rfds);
+            /*
+             * In contrast to secure_raw_write, this section runs with
+             * ImmediateInterruptOK = true so we can wait forever in
+             * select.
+             */
+            n = select(port->sock + 1, &rfds, NULL, NULL, NULL);
+            if (n < 0) break;
+
+            n = recv(port->sock, ptr, len, 0);
+
+            /*
+             * We should have something to read here so EAGAIN/EWOULDBLOCK is
+             * likey not to be seen. But we check them here not to return
+             * these error numbers for blocking sockets for the caller.
+             */
+        } while (n < 0 && (errno == EAGAIN || errno == EWOULDBLOCK));
+    }    client_read_ended();    return n;
@@ -178,5 +208,77 @@ secure_write(Port *port, void *ptr, size_t len)ssize_tsecure_raw_write(Port *port, const void
*ptr,size_t len){
 
-    return send(port->sock, ptr, len, 0);
+    int ret = 0;
+
+    /*
+     * Port socket is always in non-blocking mode. See StreamConnection for
+     * the details.
+     */
+    ret = send(port->sock, ptr, len, 0);
+
+    /* We can return here regardless of blocking mode in the most cases */
+    if (port->noblock || ret > 0 || len == 0)
+        return ret;
+
+    /* Here, we shold block waiting for the room in send buffer. */
+    while(ret < 1 && !ProcDiePending)
+    {
+        fd_set wfds;
+        struct timeval tv;
+        int i = 0;
+
+        FD_ZERO(&wfds);
+        tv.tv_usec = 0;
+
+        /*
+         * We may get terminate signal (SIGTERM) during write blocking. If we
+         * check ProcDiePending then wait by select indefinitely, SIGTERM
+         * comes after the check and before the select will be pending and we
+         * should wait the second SIGTERM. So we periodically wake up to check
+         * ProcDiePending in order to catch the signal surely.  The timeout
+         * for the select is the maximum delay of handling the signal. 1
+         * seconds groundlessly seems to be appropreate.
+         */
+        do
+        {
+            FD_SET(port->sock, &wfds);
+            tv.tv_sec = 1;
+            tv.tv_usec = 0;
+
+            ret = select(port->sock + 1, NULL, &wfds, NULL, &tv);
+        } while (!ProcDiePending && ret == 0);
+
+        if (ProcDiePending || ret < 0)
+            break;
+
+        ret = send(port->sock, ptr, len, 0);
+        if (ProcDiePending)
+            break;
+        if (ret < 0)
+        {
+            if (errno != EAGAIN && errno != EWOULDBLOCK)
+                break;
+
+            /*
+             * This loop might run a busy loop if send(2) returned EAGAIN or
+             * EWOULDBLOCK after select(2) returned normally. Sleep expressly
+             * to avoid the busy loop.
+             */
+            pg_usleep(200000L); /* 200 ms */
+            ret = 0;
+        }
+    }
+
+    if (ProcDiePending)
+    {
+        /*
+         * Allow to terminate this backend. ClientConnectionLost prevents any
+         * more bytes including error messages from being sent to
+         * client. errno is set in order to teach ssl layer not to retry.
+         */
+        ClientConnectionLost = 1;
+        errno = ECONNRESET;
+    }
+    
+    return ret;}
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index c92851e..8387d6a 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -718,6 +718,17 @@ StreamConnection(pgsocket server_fd, Port *port)        (void)
pq_setkeepalivescount(tcp_keepalives_count,port);    }
 
+    /*
+     * Put this socket to non-blocking mode. Blocking behavior is emulated in
+     * secure_write() and secure_read().
+     * Use COMMERROR on failure, because ERROR would try to send the error to
+     * the client, which might require changing the mode again, leading to
+     * infinite recursion.
+     */
+    if (!pg_set_noblock(port->sock))
+        ereport(COMMERROR,
+                (errmsg("could not set socket to nonblocking mode: %m")));
+    return STATUS_OK;}
@@ -792,27 +803,6 @@ TouchSocketFiles(void)static voidpq_set_nonblocking(bool nonblocking){
-    if (MyProcPort->noblock == nonblocking)
-        return;
-
-    /*
-     * Use COMMERROR on failure, because ERROR would try to send the error to
-     * the client, which might require changing the mode again, leading to
-     * infinite recursion.
-     */
-    if (nonblocking)
-    {
-        if (!pg_set_noblock(MyProcPort->sock))
-            ereport(COMMERROR,
-                    (errmsg("could not set socket to nonblocking mode: %m")));
-    }
-    else
-    {
-        if (!pg_set_block(MyProcPort->sock))
-            ereport(COMMERROR,
-                    (errmsg("could not set socket to blocking mode: %m")));
-    }
-    MyProcPort->noblock = nonblocking;}
@@ -1249,34 +1239,38 @@ internal_flush(void)        if (r <= 0)        {
-            if (errno == EINTR)
-                continue;        /* Ok if we were interrupted */
-
-            /*
-             * Ok if no data writable without blocking, and the socket is in
-             * non-blocking mode.
-             */
-            if (errno == EAGAIN ||
-                errno == EWOULDBLOCK)
+            if (!ClientConnectionLost)            {
+                if (errno == EINTR)
+                    continue;        /* Ok if we were interrupted */
+
+                /*
+                 * Ok if no data writable without blocking, and the socket is in
+                 * non-blocking mode.
+                 */
+                if (errno == EAGAIN ||
+                    errno == EWOULDBLOCK)
+                {                return 0;
-            }
-
-            /*
-             * Careful: an ereport() that tries to write to the client would
-             * cause recursion to here, leading to stack overflow and core
-             * dump!  This message must go *only* to the postmaster log.
-             *
-             * If a client disconnects while we're in the midst of output, we
-             * might write quite a bit of data before we get to a safe query
-             * abort point.  So, suppress duplicate log messages.
-             */
-            if (errno != last_reported_send_errno)
-            {
-                last_reported_send_errno = errno;
-                ereport(COMMERROR,
-                        (errcode_for_socket_access(),
-                         errmsg("could not send data to client: %m")));
+                }
+
+                /*
+                 * Careful: an ereport() that tries to write to the client
+                 * would cause recursion to here, leading to stack overflow
+                 * and core dump!  This message must go *only* to the
+                 * postmaster log.
+                 *
+                 * If a client disconnects while we're in the midst of output,
+                 * we might write quite a bit of data before we get to a safe
+                 * query abort point.  So, suppress duplicate log messages.
+                 */
+                if (errno != last_reported_send_errno)
+                {
+                    last_reported_send_errno = errno;
+                    ereport(COMMERROR,
+                            (errcode_for_socket_access(),
+                             errmsg("could not send data to client: %m")));
+                }            }            /*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 5d32de6..d979191 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -5789,6 +5789,13 @@ read_inheritable_socket(SOCKET *dest, InheritableSocket *src)        *dest = s;        /*
+         * We didn't inherit emulated blocking mode but port socket should be
+         * always in nonblocking mode. pg_set_noblock() on win32 backend won't
+         * return error.
+         */
+        pg_set_noblock(s);
+
+        /*         * To make sure we don't get two references to the same socket, close         * the original one.
(Thiswould happen when inheritance actually         * works..
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7b5480f..1d252e7 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2840,8 +2840,16 @@ ProcessInterrupts(void)        ImmediateInterruptOK = false;    /* not idle anymore */
DisableNotifyInterrupt();       DisableCatchupInterrupt();
 
-        /* As in quickdie, don't risk sending to client during auth */
-        if (ClientAuthInProgress && whereToSendOutput == DestRemote)
+        /*
+         *  As in quickdie, don't risk sending to client during auth. In
+         *  addition to that, don't try to send any more to client if current
+         *  connection is marked as ClientConnectionLost. It will lead to
+         *  protocol violation if the truth is that the connection is living
+         *  and amid sending data. Such case will occur if this backend was
+         *  terminated during waiting for query result to be sent.
+         */
+        if ((ClientAuthInProgress && whereToSendOutput == DestRemote) ||
+            ClientConnectionLost)            whereToSendOutput = DestNone;        if (IsAutoVacuumWorkerProcess())
      ereport(FATAL, 

Re: Escaping from blocked send() reprised.

From
Heikki Linnakangas
Date:
On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
>    To make the code mentioned above (Patch 0002) tidy, rewrite the
>    socket emulation code for win32 backends so that each socket
>    can have its own non-blocking state. (patch 0001)

The first patch that makes non-blocking sockets behave more sanely on
Windows seems like a good idea, independently of the second patch. I'm
looking at the first patch now, I'll make a separate post about the
second patch.

> Some concern about this patch,
>
> - This patch allows the number of non-blocking socket to be below
>    64 (FD_SETSIZE) on win32 backend but it seems to be sufficient.

Yeah, that's plenty.

> - This patch introduced redundant socket emulation for win32
>    backend but win32 bare socket for Port is already nonblocking
>    as described so it donsn't seem to be a serious problem on
>    performance. Addition to it, since I don't know the reason why
>    win32/socket.c provides the blocking-mode socket emulation, I
>    decided to preserve win32/socket.c to have blocking socket
>    emulation. Possibly it can be removed.

On Windows, the backend has an emulation layer for POSIX signals, which
uses threads and Windows events. The reason win32/socket.c always uses
non-blocking mode internally is that it needs to wait for the socket to
become readable/writeable, and for the signal-emulation event, at the
same time. So no, we can't remove it.

The approach taken in the first patch seems sensible. I changed it to
not use FD_SET, though. A custom array seems better, that way we don't
need the pgwin32_nonblockset_init() call, we can just use initialize the
variable. It's a little bit more code, but it's well-contained in
win32/socket.c. Please take a look, to double-check that I didn't screw up.

- Heikki


Attachment

Re: Escaping from blocked send() reprised.

From
Heikki Linnakangas
Date:
On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
> - Preventing protocol violation.
>
>    To prevent protocol violation, secure_write sets
>    ClientConnectionLost when SIGTERM detected, then
>    internal_flush() and ProcessInterrupts() follow the
>    instruction.

Oh, hang on. Now that I look at pqcomm.c more closely, it already has a 
mechanism to avoid writing a message in the middle of another message. 
See pq_putmessage and PqCommBusy. Can we rely on that?

> - Single pg_terminate_backend surely kills the backend.
>
>    secure_raw_write() uses non-blocking socket and a loop of
>    select() with timeout to surely detects received
>    signal(SIGTERM).

I was going to suggest using WaitLatchOrSocket instead of sleeping in 1 
second increment, but I see that WaitLatchOrSocket() doesn't currently 
support waiting for a socket to become writeable, without also waiting 
for it to become readable. I wonder how difficult it would be to lift 
that restriction.

I also wonder if it would be simpler to keep the socket in blocking mode 
after all, and just close() in the signal handler if PqCommBusy == true. 
If the signal arrives while sleeping in send(), the effect would be the 
same as with your patch. If the signal arrives while sending, but not 
sleeping, you would not get the chance to send the already-buffered data 
to the client. But maybe that's OK, whether or not you're blocked is not 
very deterministic anyway.

- Heikki




Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-09-02 21:46:29 +0300, Heikki Linnakangas wrote:
> I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
> second increment, but I see that WaitLatchOrSocket() doesn't currently
> support waiting for a socket to become writeable, without also waiting for
> it to become readable. I wonder how difficult it would be to lift that
> restriction.

It's imo not that difficult. I've a prototype patch for that
somewhere. I tested my poll() implementation and it worked, but didn't
yet get to select().

> I also wonder if it would be simpler to keep the socket in blocking mode
> after all, and just close() in the signal handler if PqCommBusy == true. If
> the signal arrives while sleeping in send(), the effect would be the same as
> with your patch. If the signal arrives while sending, but not sleeping, you
> would not get the chance to send the already-buffered data to the client.
> But maybe that's OK, whether or not you're blocked is not very deterministic
> anyway.

I've actually been working on a patch to make the whole interaction with
the client using sockets. The reason I started so is that we lots of far
to complex stuff in signal handlers, and using a latch would allow us to
instead interrupt send/recv. While still heavily WIP the reduction in
odd stuff (check e.g. HandleCatchupInterrupt()) made me rather happy.

I'm slightly worried about the added overhead due to the latch code. In
my implementation I only use latches after a nonblocking read, but
still. Every WaitLatchOrSocket() does a drainSelfPipe(). I wonder if
that can be made problematic.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-09-02 21:01:44 +0200, Andres Freund wrote:
> I've actually been working on a patch to make the whole interaction with
> the client using sockets. The reason I started so is that we lots of far
> to complex stuff in signal handlers, and using a latch would allow us to
> instead interrupt send/recv. While still heavily WIP the reduction in
> odd stuff (check e.g. HandleCatchupInterrupt()) made me rather happy.

Actually, the even more important reason is that that would allow us to
throw errors/fatals sanely in interrupts because we wouldn't possibly
jump through openssl anymore...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> I was going to suggest using WaitLatchOrSocket instead of sleeping in 1 
> second increment, but I see that WaitLatchOrSocket() doesn't currently 
> support waiting for a socket to become writeable, without also waiting 
> for it to become readable. I wonder how difficult it would be to lift 
> that restriction.

My recollection is that there was a reason for that, but I don't recall
details any more.
        regards, tom lane



Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-09-02 17:21:03 -0400, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> > I was going to suggest using WaitLatchOrSocket instead of sleeping in 1 
> > second increment, but I see that WaitLatchOrSocket() doesn't currently 
> > support waiting for a socket to become writeable, without also waiting 
> > for it to become readable. I wonder how difficult it would be to lift 
> > that restriction.
> 
> My recollection is that there was a reason for that, but I don't recall
> details any more.

http://git.postgresql.org/pg/commitdiff/e42a21b9e6c9b9e6346a34b62628d48ff2fc6ddf

In my prototype I've changed the API that errors set both
READABLE/WRITABLE. Seems to work....

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Heikki Linnakangas
Date:
On 09/03/2014 12:23 AM, Andres Freund wrote:
> On 2014-09-02 17:21:03 -0400, Tom Lane wrote:
>> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>>> I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
>>> second increment, but I see that WaitLatchOrSocket() doesn't currently
>>> support waiting for a socket to become writeable, without also waiting
>>> for it to become readable. I wonder how difficult it would be to lift
>>> that restriction.
>>
>> My recollection is that there was a reason for that, but I don't recall
>> details any more.
>
> http://git.postgresql.org/pg/commitdiff/e42a21b9e6c9b9e6346a34b62628d48ff2fc6ddf
>
> In my prototype I've changed the API that errors set both
> READABLE/WRITABLE. Seems to work....

Andres, would you mind posting the WIP patch you have? That could be a 
better foundation for this patch.

- Heikki




Re: Escaping from blocked send() reprised.

From
Robert Haas
Date:
On Tue, Sep 2, 2014 at 3:01 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> I'm slightly worried about the added overhead due to the latch code. In
> my implementation I only use latches after a nonblocking read, but
> still. Every WaitLatchOrSocket() does a drainSelfPipe(). I wonder if
> that can be made problematic.

I think that's not the word you're looking for.  Or if it is, then -
it's already problematic.  At some point I hacked up a very crude
prototype that made LWLocks use latches to sleep instead of
semaphores.  It was slow.

AIUI, the only reason why we need the self-pipe thing is because on
some platforms signals don't interrupt system calls.  But my
impression was that those platforms were somewhat obscure.  Could we
have a separate latch implementation for platforms where we know that
system calls will get interrupted by signals?  Alternatively, should
we consider reimplementing latches using semaphores?  I assume having
the signal handler up the semaphore would allow the attempt to down
the semaphore to succeed on return from the handler, so it would
accomplish the same thing as the self-pipe trick.

Basically, it doesn't feel like a good thing that we've got two sets
of primitives for making a backend wait that (1) don't really know
about each other and (2) use different operating system primitives.
Presumably one of the two systems is better; let's figure out which
one it is, use that one all the time, and get rid of the other one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Escaping from blocked send() reprised.

From
Heikki Linnakangas
Date:
On 09/04/2014 03:49 PM, Robert Haas wrote:
> On Tue, Sep 2, 2014 at 3:01 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> I'm slightly worried about the added overhead due to the latch code. In
>> my implementation I only use latches after a nonblocking read, but
>> still. Every WaitLatchOrSocket() does a drainSelfPipe(). I wonder if
>> that can be made problematic.
>
> I think that's not the word you're looking for.  Or if it is, then -
> it's already problematic.  At some point I hacked up a very crude
> prototype that made LWLocks use latches to sleep instead of
> semaphores.  It was slow.

Hmm. Perhaps we should call drainSelfPipe() only after poll/select 
returns saying that there is something in the self-pipe. That would be a 
win assuming it's more common for the self-pipe to be empty.

> AIUI, the only reason why we need the self-pipe thing is because on
> some platforms signals don't interrupt system calls.

That's not the only reason. It also eliminates the race condition that 
someone might set the latch after we've checked that it's not set, but 
before calling poll/select. The same reason that ppoll and pselect exist.

> But my
> impression was that those platforms were somewhat obscure.  Could we
> have a separate latch implementation for platforms where we know that
> system calls will get interrupted by signals?

... and have ppoll or pselect. Yeah, seems reasonable, assuming that 
ppoll/pselect is faster.

> Alternatively, should
> we consider reimplementing latches using semaphores?  I assume having
> the signal handler up the semaphore would allow the attempt to down
> the semaphore to succeed on return from the handler, so it would
> accomplish the same thing as the self-pipe trick.

I don't think there's a function to wait for a file descriptor or 
semaphore at the same time.

- Heikki



Re: Escaping from blocked send() reprised.

From
Robert Haas
Date:
On Thu, Sep 4, 2014 at 9:05 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Hmm. Perhaps we should call drainSelfPipe() only after poll/select returns
> saying that there is something in the self-pipe. That would be a win
> assuming it's more common for the self-pipe to be empty.

Couldn't hurt.

>> But my
>> impression was that those platforms were somewhat obscure.  Could we
>> have a separate latch implementation for platforms where we know that
>> system calls will get interrupted by signals?
>
> ... and have ppoll or pselect. Yeah, seems reasonable, assuming that
> ppoll/pselect is faster.

Hrm.  So we'd have to block SIGUSR1, check the flag, then use
pselect() to temporarily unblock SIGUSR1 and wait, then on return
again unblock SIGUSR1?  Doesn't seem very appealing.  I think changing
the signal mask is fast on Linux, but quite slow on at least some
other UNIX-like platforms.  And I've heard that pselect() isn't always
truly atomic, so we might run into platform-specific bugs, too.  I
wonder if there's a better way e.g. using memory barriers.

WaitLatch: check is_set.  if yes then done.  otherwise, set signal_me.
memory barrier.  recheck is_set.  if not set then wait using
poll/select. memory barrier.  clear signal_me.
SetLatch: check is_set.  if yes then done.  otherwise, set is_set.
memory barrier.  check signal_me.  if set, then send SIGUSR1.

>> Alternatively, should
>> we consider reimplementing latches using semaphores?  I assume having
>> the signal handler up the semaphore would allow the attempt to down
>> the semaphore to succeed on return from the handler, so it would
>> accomplish the same thing as the self-pipe trick.
>
> I don't think there's a function to wait for a file descriptor or semaphore
> at the same time.

Oh, good point.  So that's out, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Escaping from blocked send() reprised.

From
Heikki Linnakangas
Date:
On 09/04/2014 04:37 PM, Robert Haas wrote:
> Hrm.  So we'd have to block SIGUSR1, check the flag, then use
> pselect() to temporarily unblock SIGUSR1 and wait, then on return
> again unblock SIGUSR1?  Doesn't seem very appealing.  I think changing
> the signal mask is fast on Linux, but quite slow on at least some
> other UNIX-like platforms.  And I've heard that pselect() isn't always
> truly atomic, so we might run into platform-specific bugs, too.  I
> wonder if there's a better way e.g. using memory barriers.
>
> WaitLatch: check is_set.  if yes then done.  otherwise, set signal_me.
> memory barrier.  recheck is_set.  if not set then wait using
> poll/select. memory barrier.  clear signal_me.
> SetLatch: check is_set.  if yes then done.  otherwise, set is_set.
> memory barrier.  check signal_me.  if set, then send SIGUSR1.

Doesn't work. No matter what you do, the process running WaitLatch might 
receive the signal immediately before it calls poll/select. The signal 
handler will run, and the poll/select call will then go to sleep. There 
is no way to do this without support from the kernel, that is why 
ppoll/pselect exist.

- Heikki




Re: Escaping from blocked send() reprised.

From
Robert Haas
Date:
On Thu, Sep 4, 2014 at 9:53 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 09/04/2014 04:37 PM, Robert Haas wrote:
>> Hrm.  So we'd have to block SIGUSR1, check the flag, then use
>> pselect() to temporarily unblock SIGUSR1 and wait, then on return
>> again unblock SIGUSR1?  Doesn't seem very appealing.  I think changing
>> the signal mask is fast on Linux, but quite slow on at least some
>> other UNIX-like platforms.  And I've heard that pselect() isn't always
>> truly atomic, so we might run into platform-specific bugs, too.  I
>> wonder if there's a better way e.g. using memory barriers.
>>
>> WaitLatch: check is_set.  if yes then done.  otherwise, set signal_me.
>> memory barrier.  recheck is_set.  if not set then wait using
>> poll/select. memory barrier.  clear signal_me.
>> SetLatch: check is_set.  if yes then done.  otherwise, set is_set.
>> memory barrier.  check signal_me.  if set, then send SIGUSR1.
>
> Doesn't work. No matter what you do, the process running WaitLatch might
> receive the signal immediately before it calls poll/select. The signal
> handler will run, and the poll/select call will then go to sleep. There is
> no way to do this without support from the kernel, that is why ppoll/pselect
> exist.

Eesh, I was confused there: ignore me.  I was trying to optimize away
the signal handling but assuming we still had the self-pipe byte.  But
of course in that case we don't need to change anything at all.

I'm going to go get some more caffeine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hello,

> > - This patch introduced redundant socket emulation for win32
> >    backend but win32 bare socket for Port is already nonblocking
> >    as described so it donsn't seem to be a serious problem on
> >    performance. Addition to it, since I don't know the reason why
> >    win32/socket.c provides the blocking-mode socket emulation, I
> >    decided to preserve win32/socket.c to have blocking socket
> >    emulation. Possibly it can be removed.
> 
> On Windows, the backend has an emulation layer for POSIX signals,
> which uses threads and Windows events. The reason win32/socket.c
> always uses non-blocking mode internally is that it needs to wait for
> the socket to become readable/writeable, and for the signal-emulation
> event, at the same time. So no, we can't remove it.

I see, thank you.

> The approach taken in the first patch seems sensible. I changed it to
> not use FD_SET, though. A custom array seems better, that way we don't
> need the pgwin32_nonblockset_init() call, we can just use initialize
> the variable. It's a little bit more code, but it's well-contained in
> win32/socket.c. Please take a look, to double-check that I didn't
> screw up.

Thank you. I felt a bit qualm to abusing fd_set. A bit more code
is not a problem.

I had close look on your patch.

Both 'nonblocking' and 'noblock' are appears in function names,
pgwin32_set_socket_block/noblock/is_nonblocking(). I prefer
nonblocking/blocking pair but I'm satisfied they are in uniform
style anyway. (Though I also didn't so ;p)

pgwin32_set_socket_block() leaves garbage in
nonblocking_sockets[] but it's no problem practically. You also
removed blocking'ize(?) code but I agree that it is correct
because fds of nonclosed socket won't be reused anyway.

pg_set_block/noblock() made me laugh. Yes you're correct. Sorry
for the bronken (but workable) code.

After all, the patch looks pretty good.
I'll continue to fit the another patch onto this.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hello, attached is the new-and-far-simple version of this
patch. It no longer depends on win32 nonblocking patch since the
socket is in blocking mode again.

> On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
> > - Preventing protocol violation.
> >
> >    To prevent protocol violation, secure_write sets
> >    ClientConnectionLost when SIGTERM detected, then
> >    internal_flush() and ProcessInterrupts() follow the
> >    instruction.
> 
> Oh, hang on. Now that I look at pqcomm.c more closely, it already has
> a mechanism to avoid writing a message in the middle of another
> message. See pq_putmessage and PqCommBusy. Can we rely on that?

Hmm, it gracefully returns up to ExecProcNode() and PqCommBusy is
turned off on the way at pq_putmessage() under current
implement. So PqCommBusy is already false before it runs into
ProcessInterrupts().

Allowing ImmediateInterruptOK on signalled during send(), setting
whereToSendOutput to DestNone if PqCommBusy is true will do. We
can also distinguish read and write by looking
DoingCommandRead. The ImmediateInterruptOK section can be defined
enclosing by prepare_for_client_read/client_read_end.

> > - Single pg_terminate_backend surely kills the backend.
> >
> >    secure_raw_write() uses non-blocking socket and a loop of
> >    select() with timeout to surely detects received
> >    signal(SIGTERM).
> 
> I was going to suggest using WaitLatchOrSocket instead of sleeping in
> 1 second increment, but I see that WaitLatchOrSocket() doesn't
> currently support waiting for a socket to become writeable, without
> also waiting for it to become readable. I wonder how difficult it
> would be to lift that restriction.

It seems quite difficult hearing the following discussion.

> I also wonder if it would be simpler to keep the socket in blocking
> mode after all, and just close() in the signal handler if PqCommBusy
> == true. If the signal arrives while sleeping in send(), the effect
> would be the same as with your patch. If the signal arrives while
> sending, but not sleeping, you would not get the chance to send the
> already-buffered data to the client. But maybe that's OK, whether or
> not you're blocked is not very deterministic anyway.

Hmm. We're back round to the my first patch, with immediately
close the socket, and became irrelevant to win32 layer
patch. Anyway, it sounds reasonable.

Attached patch is a quick hack patch, but it seems working as
expected at a glance.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Sorry, It tha patch contains a silly bug. Please find the
attatched one.

> Hello, attached is the new-and-far-simple version of this
> patch. It no longer depends on win32 nonblocking patch since the
> socket is in blocking mode again.
> 
> > On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
> > > - Preventing protocol violation.
> > >
> > >    To prevent protocol violation, secure_write sets
> > >    ClientConnectionLost when SIGTERM detected, then
> > >    internal_flush() and ProcessInterrupts() follow the
> > >    instruction.
> > 
> > Oh, hang on. Now that I look at pqcomm.c more closely, it already has
> > a mechanism to avoid writing a message in the middle of another
> > message. See pq_putmessage and PqCommBusy. Can we rely on that?
> 
> Hmm, it gracefully returns up to ExecProcNode() and PqCommBusy is
> turned off on the way at pq_putmessage() under current
> implement. So PqCommBusy is already false before it runs into
> ProcessInterrupts().
> 
> Allowing ImmediateInterruptOK on signalled during send(), setting
> whereToSendOutput to DestNone if PqCommBusy is true will do. We
> can also distinguish read and write by looking
> DoingCommandRead. The ImmediateInterruptOK section can be defined
> enclosing by prepare_for_client_read/client_read_end.
> 
> > > - Single pg_terminate_backend surely kills the backend.
> > >
> > >    secure_raw_write() uses non-blocking socket and a loop of
> > >    select() with timeout to surely detects received
> > >    signal(SIGTERM).
> > 
> > I was going to suggest using WaitLatchOrSocket instead of sleeping in
> > 1 second increment, but I see that WaitLatchOrSocket() doesn't
> > currently support waiting for a socket to become writeable, without
> > also waiting for it to become readable. I wonder how difficult it
> > would be to lift that restriction.
> 
> It seems quite difficult hearing the following discussion.
> 
> > I also wonder if it would be simpler to keep the socket in blocking
> > mode after all, and just close() in the signal handler if PqCommBusy
> > == true. If the signal arrives while sleeping in send(), the effect
> > would be the same as with your patch. If the signal arrives while
> > sending, but not sleeping, you would not get the chance to send the
> > already-buffered data to the client. But maybe that's OK, whether or
> > not you're blocked is not very deterministic anyway.
> 
> Hmm. We're back round to the my first patch, with immediately
> close the socket, and became irrelevant to win32 layer
> patch. Anyway, it sounds reasonable.
> 
> Attached patch is a quick hack patch, but it seems working as
> expected at a glance.

Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hi, I added and edited some comments.

> Sorry, It tha patch contains a silly bug. Please find the
> attatched one.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hmm. Sorry, I misunderstood the specification.

> You approach that coloring tokens seems right, but you have
> broken the parse logic by adding your code.
> 
> Other than the mistakes others pointed, I found that
> 
> - non-SQL-ident like tokens are ignored by their token style,
>   quoted or not, so the following line works.
> 
> | "local" All aLL trust
> 
> I suppose this is not what you intended. This is because you have
> igonred the attribute of a token when comparing it as
> non-SQL-ident tokens.
> 
> 
> - '+' at the head of the sequence '+"' is treated as the first
>   character of the *quoted* string. e.g. +"hoge" is tokenized as
>   "+hoge":special_quoted.

I found this is what intended. This should be documented as
comments.

|2) users and user-groups only requires special handling and behavior as follows
|    Normal user :
|      A. unquoted ( USER ) will be treated as user ( downcase ).
|      B. quoted  ( "USeR" )  will be treated as USeR (case-sensitive).
|      C. quoted ( "+USER" ) will be treated as normal user +USER (i.e. will not be considered as user-group) and
case-sensitiveas string is quoted.
 

This seems confising with the B below. This seems should be
rearranged.

|   User Group :
|      A. unquoted ( +USERGROUP ) will be treated as +usergruop ( downcase ).
|      B. plus quoted ( +"UserGROUP"  ) will be treated as +UserGROUP (case-sensitive).



> This is why you simply continued processing for '+"' without
> discarding and skipping the '+', and not setting in_quote so the
> following parser code works as it is not intended. You should
> understand what the original code does and insert or modify
> logics not braeking the assumptions.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Escaping from blocked send() reprised.

From
Heikki Linnakangas
Date:
Wrong thread...

On 09/10/2014 03:04 AM, Kyotaro HORIGUCHI wrote:
> Hmm. Sorry, I misunderstood the specification.
>
>> You approach that coloring tokens seems right, but you have
>> broken the parse logic by adding your code.
>>
>> Other than the mistakes others pointed, I found that
>>
>> - non-SQL-ident like tokens are ignored by their token style,
>>    quoted or not, so the following line works.
>>
>> | "local" All aLL trust
>>
>> I suppose this is not what you intended. This is because you have
>> igonred the attribute of a token when comparing it as
>> non-SQL-ident tokens.
>>
>>
>> - '+' at the head of the sequence '+"' is treated as the first
>>    character of the *quoted* string. e.g. +"hoge" is tokenized as
>>    "+hoge":special_quoted.
>
> I found this is what intended. This should be documented as
> comments.
>
> |2) users and user-groups only requires special handling and behavior as follows
> |    Normal user :
> |      A. unquoted ( USER ) will be treated as user ( downcase ).
> |      B. quoted  ( "USeR" )  will be treated as USeR (case-sensitive).
> |      C. quoted ( "+USER" ) will be treated as normal user +USER (i.e. will not be considered as user-group) and
case-sensitiveas string is quoted.
 
>
> This seems confising with the B below. This seems should be
> rearranged.
>
> |   User Group :
> |      A. unquoted ( +USERGROUP ) will be treated as +usergruop ( downcase ).
> |      B. plus quoted ( +"UserGROUP"  ) will be treated as +UserGROUP (case-sensitive).
>
>
>
>> This is why you simply continued processing for '+"' without
>> discarding and skipping the '+', and not setting in_quote so the
>> following parser code works as it is not intended. You should
>> understand what the original code does and insert or modify
>> logics not braeking the assumptions.
>
> regards,
>


-- 
- Heikki



Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Sorry for the mistake...

At Wed, 10 Sep 2014 18:53:03 +0300, Heikki Linnakangas <hlinnakangas@vmware.com> wrote in <541073DF.70902@vmware.com>
> Wrong thread...
> 
> On 09/10/2014 03:04 AM, Kyotaro HORIGUCHI wrote:
> > Hmm. Sorry, I misunderstood the specification.
> >
> >> You approach that coloring tokens seems right, but you have
> >> broken the parse logic by adding your code.
> >>
> >> Other than the mistakes others pointed, I found that
> >>
> >> - non-SQL-ident like tokens are ignored by their token style,
> >>    quoted or not, so the following line works.
> >>
> >> | "local" All aLL trust

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Escaping from blocked send() reprised.

From
Heikki Linnakangas
Date:
On 09/09/2014 01:31 PM, Kyotaro HORIGUCHI wrote:
> Hi, I added and edited some comments.
>
>> Sorry, It tha patch contains a silly bug. Please find the
>> attatched one.

I must say this scares the heck out of me. The current code goes through
some trouble to not throw an error while in a recv() send(). For
example, you removed the DoingCommandRead check from
prepare_for_client_read(). There's an comment in postgres.c that says this:

>         /*
>          * (2) Allow asynchronous signals to be executed immediately if they
>          * come in while we are waiting for client input. (This must be
>          * conditional since we don't want, say, reads on behalf of COPY FROM
>          * STDIN doing the same thing.)
>          */
>         DoingCommandRead = true;

With the patch, we do allow asynchronous signals to be processed while
blocked in COPY FROM STDIN. Maybe that's OK, but I don't feel
comfortable just changing it. (the comment is now wrong, of course)

This patch also enables processing query cancel signals while blocked,
not just SIGTERM. That's not good; we might be in the middle of sending
a message, and we cannot just error out of that or we might violate the
fe/be protocol. That's OK with a SIGTERM as you're terminating the
connection anyway, and we have the PqCommBusy safeguard in place that
prevents us from sending broken messages to the client, but that's not
good enough if we wanted to keep the backend alive, as we won't be able
to send anything to the client anymore.

BTW, we've been talking about blocking in send(), but this patch also
let's a recv() in e.g. COPY FROM STDIN to be interrupted. That's
probably a good thing; surely you have exactly the same issues with that
as with send(). But I didn't realize we had a problem with that too.


In summary, this patch is not ready as it is, but I think we can fix it.
The key question is: is it safe to handle SIGTERM in the signal handler,
calling the exit-handlers and exiting the backend, when blocked in a
recv() or send()? It's a change in the pqcomm.c API; most pqcomm.c
functions have not thrown errors or processed interrupts before. But
looking at the callers, I think it's safe, and there isn't actually any
comments explicitly saying that pqcomm.c will never throw errors.

I propose the attached patch. It adds a new flag ImmediateDieOK, which
is a weaker form of ImmediateInterruptOK that only allows handling a
pending die-signal in the signal handler.

Robert, others, do you see a problem with this?

Over IM, Robert pointed out that it's not safe to jump out of a signal
handler with siglongjmp, when we're inside library calls, like in a
callback called by OpenSSL. But even with current master branch, that's
exactly what we do. In secure_raw_read(), we set ImmediateInterruptOK =
true, which means that any incoming signal will be handled directly in
the signal handler, which can mean elog(ERROR). Should we be worried?
OpenSSL might get confused if control never returns to the SSL_read() or
SSL_write() function that called secure_raw_read().

- Heikki


Attachment

Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-09-03 15:09:54 +0300, Heikki Linnakangas wrote:
> On 09/03/2014 12:23 AM, Andres Freund wrote:
> >On 2014-09-02 17:21:03 -0400, Tom Lane wrote:
> >>Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> >>>I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
> >>>second increment, but I see that WaitLatchOrSocket() doesn't currently
> >>>support waiting for a socket to become writeable, without also waiting
> >>>for it to become readable. I wonder how difficult it would be to lift
> >>>that restriction.
> >>
> >>My recollection is that there was a reason for that, but I don't recall
> >>details any more.
> >
> >http://git.postgresql.org/pg/commitdiff/e42a21b9e6c9b9e6346a34b62628d48ff2fc6ddf
> >
> >In my prototype I've changed the API that errors set both
> >READABLE/WRITABLE. Seems to work....
> 
> Andres, would you mind posting the WIP patch you have? That could be a
> better foundation for this patch.

Sorry, I missed this message and only cought up when reading your CF
status mail. I've attached three patches:

0001: Allows WaitLatchOrSocket(WL_WRITABLE) without WL_READABLE. I've     tested the poll() and select()
implementationson linux and     blindly patched up windows.
 
0002: Put the socket the backend uses to communicate with the client     into nonblocking mode as soon as latches are
readyand use latches     to wait. This probably doesn't work correctly without 0003, but     seems easier to review
separately.
0003: Don't do sinval catchup and notify processing in signal     handlers. It's quite cool that it worked that well so
far,but it     requires some complicated code and is rather fragile. 0002 allows     to move that out of signal
handlersand just use a latch     there. This seems remarkably simpler:      4 files changed, 69 insertions(+), 229
deletions(-)

These aren't ready for commit, especially not 0003, but I think they are
quite a good foundation for getting rid of the blocking in send(). I
haven't added any interrupt processing after interrupted writes, but
marked the relevant places with XXXs.

With regard to 0002, I dislike the current need to do interrupt
processing both in be-secure.c and be-secure-openssl.c. I guess we could
solve that by returning something like EINTR from the ssl routines when
they need further reads/writes and do all the processing in one place in
be-secure.c.

There's also some cleanup in 0002/0003 needed:
prepare_for_client_read()/client_read_ended() aren't needed in that form
anymore and should probably rather be something like
CHECK_FOR_READ_INTERRUPT() or similar.  Similarly the
EnableCatchupInterrupt()/DisableCatchupInterrupt() in autovacuum.c is
pretty ugly.

Btw, be-secure.c is really not a good name anymore...

What do you think?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-09-27 21:12:43 +0200, Andres Freund wrote:
> On 2014-09-03 15:09:54 +0300, Heikki Linnakangas wrote:
> > On 09/03/2014 12:23 AM, Andres Freund wrote:
> > >On 2014-09-02 17:21:03 -0400, Tom Lane wrote:
> > >>Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> > >>>I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
> > >>>second increment, but I see that WaitLatchOrSocket() doesn't currently
> > >>>support waiting for a socket to become writeable, without also waiting
> > >>>for it to become readable. I wonder how difficult it would be to lift
> > >>>that restriction.
> > >>
> > >>My recollection is that there was a reason for that, but I don't recall
> > >>details any more.
> > >
> > >http://git.postgresql.org/pg/commitdiff/e42a21b9e6c9b9e6346a34b62628d48ff2fc6ddf
> > >
> > >In my prototype I've changed the API that errors set both
> > >READABLE/WRITABLE. Seems to work....
> >
> > Andres, would you mind posting the WIP patch you have? That could be a
> > better foundation for this patch.
>
> Sorry, I missed this message and only cought up when reading your CF
> status mail. I've attached three patches:
>
> 0001: Allows WaitLatchOrSocket(WL_WRITABLE) without WL_READABLE. I've
>       tested the poll() and select() implementations on linux and
>       blindly patched up windows.
> 0002: Put the socket the backend uses to communicate with the client
>       into nonblocking mode as soon as latches are ready and use latches
>       to wait. This probably doesn't work correctly without 0003, but
>       seems easier to review separately.
> 0003: Don't do sinval catchup and notify processing in signal
>       handlers. It's quite cool that it worked that well so far, but it
>       requires some complicated code and is rather fragile. 0002 allows
>       to move that out of signal handlers and just use a latch
>       there. This seems remarkably simpler:
>        4 files changed, 69 insertions(+), 229 deletions(-)
>
> These aren't ready for commit, especially not 0003, but I think they are
> quite a good foundation for getting rid of the blocking in send(). I
> haven't added any interrupt processing after interrupted writes, but
> marked the relevant places with XXXs.
>
> With regard to 0002, I dislike the current need to do interrupt
> processing both in be-secure.c and be-secure-openssl.c. I guess we could
> solve that by returning something like EINTR from the ssl routines when
> they need further reads/writes and do all the processing in one place in
> be-secure.c.
>
> There's also some cleanup in 0002/0003 needed:
> prepare_for_client_read()/client_read_ended() aren't needed in that form
> anymore and should probably rather be something like
> CHECK_FOR_READ_INTERRUPT() or similar.  Similarly the
> EnableCatchupInterrupt()/DisableCatchupInterrupt() in autovacuum.c is
> pretty ugly.
>
> Btw, be-secure.c is really not a good name anymore...
>
> What do you think?

I've invested some more time in this:
0002 now makes sense on its own and doesn't change anything around the
     interrupt handling. Oh, and it compiles without 0003.
0003 Sinval/notify processing got simplified further. There really isn't
     any need for DisableNotifyInterrupt/DisableCatchupInterrupt
     anymore. Also begin_client_read/client_read_ended don't make much
     sense anymore. Instead introduce ProcessClientReadInterrupt (which
     wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
     set. All of that happens via the latch mechanism, nothing happens
     inside signal handlers. So I do think it's quite an improvement
     over what's been discussed in this thread.
     But it (and the other approaches) do noticeably increase the
     likelihood of clients not getting the error message if the client
     isn't actually dead. The likelihood of write() being blocked
     *temporarily* due to normal bandwidth constraints is quite high
     when you consider COPY FROM and similar. Right now we'll wait till
     we can put the error message into the socket afaics.

1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-09-28 00:54:21 +0200, Andres Freund wrote:
> On 2014-09-27 21:12:43 +0200, Andres Freund wrote:
> > On 2014-09-03 15:09:54 +0300, Heikki Linnakangas wrote:
> > Sorry, I missed this message and only cought up when reading your CF
> > status mail. I've attached three patches:
> > 
> > 0001: Allows WaitLatchOrSocket(WL_WRITABLE) without WL_READABLE. I've
> >       tested the poll() and select() implementations on linux and
> >       blindly patched up windows.
> > 0002: Put the socket the backend uses to communicate with the client
> >       into nonblocking mode as soon as latches are ready and use latches
> >       to wait. This probably doesn't work correctly without 0003, but
> >       seems easier to review separately.
> > 0003: Don't do sinval catchup and notify processing in signal
> >       handlers. It's quite cool that it worked that well so far, but it
> >       requires some complicated code and is rather fragile. 0002 allows
> >       to move that out of signal handlers and just use a latch
> >       there. This seems remarkably simpler:
> >        4 files changed, 69 insertions(+), 229 deletions(-)
> > 
> > These aren't ready for commit, especially not 0003, but I think they are
> > quite a good foundation for getting rid of the blocking in send(). I
> > haven't added any interrupt processing after interrupted writes, but
> > marked the relevant places with XXXs.
> > 
> > With regard to 0002, I dislike the current need to do interrupt
> > processing both in be-secure.c and be-secure-openssl.c. I guess we could
> > solve that by returning something like EINTR from the ssl routines when
> > they need further reads/writes and do all the processing in one place in
> > be-secure.c.
> > 
> > There's also some cleanup in 0002/0003 needed:
> > prepare_for_client_read()/client_read_ended() aren't needed in that form
> > anymore and should probably rather be something like
> > CHECK_FOR_READ_INTERRUPT() or similar.  Similarly the
> > EnableCatchupInterrupt()/DisableCatchupInterrupt() in autovacuum.c is
> > pretty ugly.
> > 
> > Btw, be-secure.c is really not a good name anymore...
> > 
> > What do you think?
> 
> I've invested some more time in this:
> 0002 now makes sense on its own and doesn't change anything around the
>      interrupt handling. Oh, and it compiles without 0003.
> 0003 Sinval/notify processing got simplified further. There really isn't
>      any need for DisableNotifyInterrupt/DisableCatchupInterrupt
>      anymore. Also begin_client_read/client_read_ended don't make much
>      sense anymore. Instead introduce ProcessClientReadInterrupt (which
>      wants a better name).
> There's also a very WIP
> 0004 Allows secure_read/write be interrupted when ProcDiePending is
>      set. All of that happens via the latch mechanism, nothing happens
>      inside signal handlers. So I do think it's quite an improvement
>      over what's been discussed in this thread.
>      But it (and the other approaches) do noticeably increase the
>      likelihood of clients not getting the error message if the client
>      isn't actually dead. The likelihood of write() being blocked
>      *temporarily* due to normal bandwidth constraints is quite high
>      when you consider COPY FROM and similar. Right now we'll wait till
>      we can put the error message into the socket afaics.
> 
> 1-3 need some serious comment work, but I think the approach is
> basically sound. I'm much, much less sure about allowing send() to be
> interrupted.

Kyatoro, could you check whether you can achieve what you want using
0004?

It's imo pretty clear that a fair amount of base work needs to be done
and there's been a fair amount of progress made this fest. I think this
can now be marked returned with feedback.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Thank you for reviewing. I'll look close to the patch tomorrow.

> I must say this scares the heck out of me. The current code goes
> through some trouble to not throw an error while in a recv()
> send(). For example, you removed the DoingCommandRead check from
> prepare_for_client_read(). There's an comment in postgres.c that says
> this:
> 
> >         /*
> >          * (2) Allow asynchronous signals to be executed immediately
> >          * if they
> >          * come in while we are waiting for client input. (This must
> >          * be
> >          * conditional since we don't want, say, reads on behalf of
> >          * COPY FROM
> >          * STDIN doing the same thing.)
> >          */
> >         DoingCommandRead = true;

Hmm. Sorry. That's my fault that I skipped over the issues about
"COPY FROM STDIN".

> With the patch, we do allow asynchronous signals to be processed while
> blocked in COPY FROM STDIN. Maybe that's OK, but I don't feel
> comfortable just changing it. (the comment is now wrong, of course)

I don't see actual problem but I agree that the behavior should
not be chenged.

> This patch also enables processing query cancel signals while blocked,
> not just SIGTERM. That's not good; we might be in the middle of
> sending a message, and we cannot just error out of that or we might
> violate the fe/be protocol. That's OK with a SIGTERM as you're
> terminating the connection anyway, and we have the PqCommBusy
> safeguard in place that prevents us from sending broken messages to
> the client, but that's not good enough if we wanted to keep the
> backend alive, as we won't be able to send anything to the client
> anymore.

Ok, since what I want is escaping from blocked send() only by
SIGTERM, it needs another mechanism from current
prepare_for_client_read().

> BTW, we've been talking about blocking in send(), but this patch also
> let's a recv() in e.g. COPY FROM STDIN to be interrupted. That's
> probably a good thing; surely you have exactly the same issues with
> that as with send(). But I didn't realize we had a problem with that
> too.

I see. (But it is mere a side-effect of my carelessness, as you know:)

> In summary, this patch is not ready as it is, but I think we can fix
> it. The key question is: is it safe to handle SIGTERM in the signal
> handler, calling the exit-handlers and exiting the backend, when
> blocked in a recv() or send()? It's a change in the pqcomm.c API; most
> pqcomm.c functions have not thrown errors or processed interrupts
> before. But looking at the callers, I think it's safe, and there isn't
> actually any comments explicitly saying that pqcomm.c will never throw
> errors.
> 
> I propose the attached patch. It adds a new flag ImmediateDieOK, which
> is a weaker form of ImmediateInterruptOK that only allows handling a
> pending die-signal in the signal handler.
> 
> Robert, others, do you see a problem with this?

The patch seems excluding all problems menthioned in the message,
I have no objection to it.

> Over IM, Robert pointed out that it's not safe to jump out of a signal
> handler with siglongjmp, when we're inside library calls, like in a
> callback called by OpenSSL. But even with current master branch,
> that's exactly what we do. In secure_raw_read(), we set
> ImmediateInterruptOK = true, which means that any incoming signal will
> be handled directly in the signal handler, which can mean
> elog(ERROR). Should we be worried? OpenSSL might get confused if
> control never returns to the SSL_read() or SSL_write() function that
> called secure_raw_read().

IMHO, it will soon die even if OpenSSL is confused. It seems a
bit brute that sudden cutoff occurs even when the socket is *not*
blocked, but the backend will soon die and frontend will
immediately get ECONNRESET (..hmm it is not seen in manpages of
recv/read(2)) and should safely exit from OpenSSL.

I cannot run this patch right now, but it seems to be no problem.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Wow, thank you for the patch.

> > > 0001: Allows WaitLatchOrSocket(WL_WRITABLE) without WL_READABLE. I've
> > >       tested the poll() and select() implementations on linux and
> > >       blindly patched up windows.
> > > 0002: Put the socket the backend uses to communicate with the client
> > >       into nonblocking mode as soon as latches are ready and use latches
> > >       to wait. This probably doesn't work correctly without 0003, but
> > >       seems easier to review separately.
> > > 0003: Don't do sinval catchup and notify processing in signal
> > >       handlers. It's quite cool that it worked that well so far, but it
> > >       requires some complicated code and is rather fragile. 0002 allows
> > >       to move that out of signal handlers and just use a latch
> > >       there. This seems remarkably simpler:
> > >        4 files changed, 69 insertions(+), 229 deletions(-)
> > > 
> > > These aren't ready for commit, especially not 0003, but I think they are
> > > quite a good foundation for getting rid of the blocking in send(). I
> > > haven't added any interrupt processing after interrupted writes, but
> > > marked the relevant places with XXXs.
> > > 
> > > With regard to 0002, I dislike the current need to do interrupt
> > > processing both in be-secure.c and be-secure-openssl.c. I guess we could
> > > solve that by returning something like EINTR from the ssl routines when
> > > they need further reads/writes and do all the processing in one place in
> > > be-secure.c.
> > > 
> > > There's also some cleanup in 0002/0003 needed:
> > > prepare_for_client_read()/client_read_ended() aren't needed in that form
> > > anymore and should probably rather be something like
> > > CHECK_FOR_READ_INTERRUPT() or similar.  Similarly the
> > > EnableCatchupInterrupt()/DisableCatchupInterrupt() in autovacuum.c is
> > > pretty ugly.
> > > 
> > > Btw, be-secure.c is really not a good name anymore...
> > > 
> > > What do you think?
> > 
> > I've invested some more time in this:
> > 0002 now makes sense on its own and doesn't change anything around the
> >      interrupt handling. Oh, and it compiles without 0003.
> > 0003 Sinval/notify processing got simplified further. There really isn't
> >      any need for DisableNotifyInterrupt/DisableCatchupInterrupt
> >      anymore. Also begin_client_read/client_read_ended don't make much
> >      sense anymore. Instead introduce ProcessClientReadInterrupt (which
> >      wants a better name).
> > There's also a very WIP
> > 0004 Allows secure_read/write be interrupted when ProcDiePending is
> >      set. All of that happens via the latch mechanism, nothing happens
> >      inside signal handlers. So I do think it's quite an improvement
> >      over what's been discussed in this thread.
> >      But it (and the other approaches) do noticeably increase the
> >      likelihood of clients not getting the error message if the client
> >      isn't actually dead. The likelihood of write() being blocked
> >      *temporarily* due to normal bandwidth constraints is quite high
> >      when you consider COPY FROM and similar. Right now we'll wait till
> >      we can put the error message into the socket afaics.
> > 
> > 1-3 need some serious comment work, but I think the approach is
> > basically sound. I'm much, much less sure about allowing send() to be
> > interrupted.
> 
> Kyatoro, could you check whether you can achieve what you want using
> 0004?
> 
> It's imo pretty clear that a fair amount of base work needs to be done
> and there's been a fair amount of progress made this fest. I think this
> can now be marked returned with feedback.

Myself is satisfied by Heikki's solution, and it seems ready for
commit. But I agree with the temporarily blocked state is seen
often and it breaks even non-blocked socket. If we want to/should
avoid breaking *temporarily or not* blocked socket even for
SIGTERM, this mechanism should be used.

Which way should we take?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
By the way,

> Sorry, I missed this message and only cought up when reading your CF
> status mail. I've attached three patches:

Could let me know how to get the CF status mail?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Escaping from blocked send() reprised.

From
Heikki Linnakangas
Date:
On 09/30/2014 10:05 AM, Kyotaro HORIGUCHI wrote:
> By the way,
>
>> Sorry, I missed this message and only cought up when reading your CF
>> status mail. I've attached three patches:
>
> Could let me know how to get the CF status mail?

I think he meant this email I sent last weekend:

http://www.postgresql.org/message-id/542672D2.3060708@vmware.com

- Heikki




Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-09-26 21:02:16 +0300, Heikki Linnakangas wrote:
> I propose the attached patch. It adds a new flag ImmediateDieOK, which is a
> weaker form of ImmediateInterruptOK that only allows handling a pending
> die-signal in the signal handler.
> 
> Robert, others, do you see a problem with this?

Per se I don't have a problem with it. There does exist the problem that
the user doesn't get a error message in more cases though. On the other
hand it's bad if any user can prevent the database from restarting.

> Over IM, Robert pointed out that it's not safe to jump out of a signal
> handler with siglongjmp, when we're inside library calls, like in a callback
> called by OpenSSL. But even with current master branch, that's exactly what
> we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means
> that any incoming signal will be handled directly in the signal handler,
> which can mean elog(ERROR). Should we be worried? OpenSSL might get confused
> if control never returns to the SSL_read() or SSL_write() function that
> called secure_raw_read().

But this is imo prohibitive. Yes, we're doing it for a long while. But
no, that's not ok. It actually prompoted me into prototyping the latch
thing (in some other thread). I don't think existing practice justifies
expanding it further.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
> >> Sorry, I missed this message and only cought up when reading your CF
> >> status mail. I've attached three patches:
> >
> > Could let me know how to get the CF status mail?
> 
> I think he meant this email I sent last weekend:
> 
> http://www.postgresql.org/message-id/542672D2.3060708@vmware.com

I see, that's what I also received. Thank you.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hello,

> > I propose the attached patch. It adds a new flag ImmediateDieOK, which is a
> > weaker form of ImmediateInterruptOK that only allows handling a pending
> > die-signal in the signal handler.
> > 
> > Robert, others, do you see a problem with this?
> 
> Per se I don't have a problem with it. There does exist the problem that
> the user doesn't get a error message in more cases though. On the other
> hand it's bad if any user can prevent the database from restarting.
> 
> > Over IM, Robert pointed out that it's not safe to jump out of a signal
> > handler with siglongjmp, when we're inside library calls, like in a callback
> > called by OpenSSL. But even with current master branch, that's exactly what
> > we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means
> > that any incoming signal will be handled directly in the signal handler,
> > which can mean elog(ERROR). Should we be worried? OpenSSL might get confused
> > if control never returns to the SSL_read() or SSL_write() function that
> > called secure_raw_read().
> 
> But this is imo prohibitive. Yes, we're doing it for a long while. But
> no, that's not ok. It actually prompoted me into prototyping the latch
> thing (in some other thread). I don't think existing practice justifies
> expanding it further.

I see, in that case, this approach seems basically
applicable. But if I understand correctly, this patch seems not
to return out of the openssl code even when latch was found to be
set in secure_raw_write/read.  I tried setting errno = ECONNRESET
and it went well but seems a bad deed.

secure_raw_write(Port *port, const void *ptr, size_t len)
{ n = send(port->sock, ptr, len, 0);  if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN)) {   w =
WaitLatchOrSocket(&MyProc->procLatch,...
 
   if (w & WL_LATCH_SET)   {ResetLatch(&MyProc->procLatch);       /*       * Force a return, so interrupts can be
processedwhen not       * (possibly) underneath a ssl library.       */       errno = EINTR;       (return n;  // n is
negative)


my_sock_write(BIO *h, const char *buf, int size)
{ res = secure_raw_write(((Port *) h->ptr), buf, size); BIO_clear_retry_flags(h); if (res <= 0) {   if (errno == EINTR
||errno == EWOULDBLOCK || errno == EAGAIN)   {     BIO_set_retry_write(h); 
 

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-10-02 17:47:39 +0900, Kyotaro HORIGUCHI wrote:
> Hello,
> 
> > > I propose the attached patch. It adds a new flag ImmediateDieOK, which is a
> > > weaker form of ImmediateInterruptOK that only allows handling a pending
> > > die-signal in the signal handler.
> > > 
> > > Robert, others, do you see a problem with this?
> > 
> > Per se I don't have a problem with it. There does exist the problem that
> > the user doesn't get a error message in more cases though. On the other
> > hand it's bad if any user can prevent the database from restarting.
> > 
> > > Over IM, Robert pointed out that it's not safe to jump out of a signal
> > > handler with siglongjmp, when we're inside library calls, like in a callback
> > > called by OpenSSL. But even with current master branch, that's exactly what
> > > we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means
> > > that any incoming signal will be handled directly in the signal handler,
> > > which can mean elog(ERROR). Should we be worried? OpenSSL might get confused
> > > if control never returns to the SSL_read() or SSL_write() function that
> > > called secure_raw_read().
> > 
> > But this is imo prohibitive. Yes, we're doing it for a long while. But
> > no, that's not ok. It actually prompoted me into prototyping the latch
> > thing (in some other thread). I don't think existing practice justifies
> > expanding it further.
> 
> I see, in that case, this approach seems basically
> applicable. But if I understand correctly, this patch seems not
> to return out of the openssl code even when latch was found to be
> set in secure_raw_write/read.

Correct. That's why I think it's the way forward. There's several
problems now where the inability to do real things while reading/writing
is a problem.

>  I tried setting errno = ECONNRESET
> and it went well but seems a bad deed.

Where and why did you do that?

> secure_raw_write(Port *port, const void *ptr, size_t len)
> {
>   n = send(port->sock, ptr, len, 0);
>   
>   if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
>   {
>     w = WaitLatchOrSocket(&MyProc->procLatch, ...
> 
>     if (w & WL_LATCH_SET)
>     {
>     ResetLatch(&MyProc->procLatch);
>         /*
>         * Force a return, so interrupts can be processed when not
>         * (possibly) underneath a ssl library.
>         */
>         errno = EINTR;
>         (return n;  // n is negative)
> 
> 
> my_sock_write(BIO *h, const char *buf, int size)
> {
>   res = secure_raw_write(((Port *) h->ptr), buf, size);
>   BIO_clear_retry_flags(h);
>   if (res <= 0)
>   {
>     if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
>     {
>       BIO_set_retry_write(h);

Hm, this seems, besides one comment, the code from the last patch in my
series. Do you have a particular question about it?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hi,

> > > But this is imo prohibitive. Yes, we're doing it for a long while. But
> > > no, that's not ok. It actually prompoted me into prototyping the latch
> > > thing (in some other thread). I don't think existing practice justifies
> > > expanding it further.
> > 
> > I see, in that case, this approach seems basically
> > applicable. But if I understand correctly, this patch seems not
> > to return out of the openssl code even when latch was found to be
> > set in secure_raw_write/read.
> 
> Correct. That's why I think it's the way forward. There's several
> problems now where the inability to do real things while reading/writing
> is a problem.
> 
> >  I tried setting errno = ECONNRESET
> > and it went well but seems a bad deed.
> 
> Where and why did you do that?

The patch of this message.

http://www.postgresql.org/message-id/20140828.214704.93968088.horiguchi.kyotaro@lab.ntt.co.jp

The reason for setting errno (instead of a variable for it) is to
trick openssl (or my_socck_write? I've forgot it..) into
recognizing as if the underneath send(2) have returned with any
uncontinueable error so it cannot be any of continueable errnos
(EINTR/EWOULDBLOCK/EAGAIN). Iy my faint memory, only avoiding
BIO_set_retry_write() in my_sock_write() dosn't work as expected
but it might be enough that my_sock_write returns -1 and doesn't
set BIO_set_retry_write().

The reason why ECONNNRESET is any of other errnos possible for
send(2)(*1) doesn't seem to fit the real situation, and the
blocked situation seems similar to resetted connection from the
view that it cannot continue to work due to external condition,
and it is used in be_tls_write() in a similary way.

Come to think of it, setting ECONNRESET is not so evil?


> > secure_raw_write(Port *port, const void *ptr, size_t len)
> > {
> >   n = send(port->sock, ptr, len, 0);
> >   
> >   if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
> >   {
> >     w = WaitLatchOrSocket(&MyProc->procLatch, ...
> > 
> >     if (w & WL_LATCH_SET)
> >     {
> >     ResetLatch(&MyProc->procLatch);
> >         /*
> >         * Force a return, so interrupts can be processed when not
> >         * (possibly) underneath a ssl library.
> >         */
> >         errno = EINTR;
> >         (return n;  // n is negative)
> > 
> > 
> > my_sock_write(BIO *h, const char *buf, int size)
> > {
> >   res = secure_raw_write(((Port *) h->ptr), buf, size);
> >   BIO_clear_retry_flags(h);
> >   if (res <= 0)
> >   {
> >     if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
> >     {
> >       BIO_set_retry_write(h);
> 
> Hm, this seems, besides one comment, the code from the last patch in my
> series. Do you have a particular question about it?

I didn't have a particluar qustion about it. This is cited only
in order to show the route to retrying.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Escaping from blocked send() reprised.

From
Heikki Linnakangas
Date:
On 09/28/2014 01:54 AM, Andres Freund wrote:
> I've invested some more time in this:

Thanks!

In 0001, the select() codepath will not return (WL_SOCKET_READABLE | 
WL_SOCKET_WRITEABLE) on EOF or error, like the comment says and like the 
poll() path does. It only sets WL_SOCKET_READABLE if WL_SOCKET_READABLE 
was requested as a wake-event, and likewise for writeable, while the 
poll() codepath returns (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE) 
regardless of the requested wake-events. I'm not sure which is actually 
better - a separate WL_SOCKET_ERROR code might be best - but it's 
inconsistent as it is.

> 0002 now makes sense on its own and doesn't change anything around the
>       interrupt handling. Oh, and it compiles without 0003.

WaitLatchOrSocket() can throw an error, so it's not totally safe to call 
that underneath OpenSSL. Admittedly the cases where it throws an error 
are "shouldn't happen" cases like "poll() failed" or "read() on 
self-pipe failed", but still. Perhaps those errors should be 
reclassified as FATAL; it's not clear you can just roll back and expect 
to continue running if any of them happens.

In secure_raw_write(), you need to save and restore errno, as 
WaitLatchOrSocket will not preserve it. If secure_raw_write() calls 
WaitLatchOrSocket(), and it returns because the latch was set, and we 
fall out of secure_raw_write, we will return -1 but the errno might not 
be set to anything sensible anymore.

> 0003 Sinval/notify processing got simplified further. There really isn't
>       any need for DisableNotifyInterrupt/DisableCatchupInterrupt
>       anymore. Also begin_client_read/client_read_ended don't make much
>       sense anymore. Instead introduce ProcessClientReadInterrupt (which
>       wants a better name).
> There's also a very WIP
> 0004 Allows secure_read/write be interrupted when ProcDiePending is
>       set. All of that happens via the latch mechanism, nothing happens
>       inside signal handlers. So I do think it's quite an improvement
>       over what's been discussed in this thread.
>       But it (and the other approaches) do noticeably increase the
>       likelihood of clients not getting the error message if the client
>       isn't actually dead. The likelihood of write() being blocked
>       *temporarily* due to normal bandwidth constraints is quite high
>       when you consider COPY FROM and similar. Right now we'll wait till
>       we can put the error message into the socket afaics.
>
> 1-3 need some serious comment work, but I think the approach is
> basically sound. I'm much, much less sure about allowing send() to be
> interrupted.

Yeah, 1-3 seem sane. 4 also looks OK to me at a quick glance. It 
basically enables handling the "die" interrupt immediately, if we're 
blocked in a read or write. It won't be handled in the signal handler, 
but within the secure_read/write call anyway.

- Heikki




Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
Hi,

On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
> On 09/28/2014 01:54 AM, Andres Freund wrote:
> >I've invested some more time in this:
> 
> Thanks!
> 
> In 0001, the select() codepath will not return (WL_SOCKET_READABLE |
> WL_SOCKET_WRITEABLE) on EOF or error, like the comment says and like the
> poll() path does. It only sets WL_SOCKET_READABLE if WL_SOCKET_READABLE was
> requested as a wake-event, and likewise for writeable, while the poll()
> codepath returns (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE) regardless of
> the requested wake-events. I'm not sure which is actually better - a
> separate WL_SOCKET_ERROR code might be best - but it's inconsistent as it
> is.

Hm. Right. I think we should only report the requested state. We can't
really discern wether it's a hangup, error or actually readable/writable
with select() - it just returns the socket as readable/writable as soon
as it doesn't block anymore. Where not blocking includes the connection
having gone bad.

It took me a while to figure out whether that's actually guaranteed by
the spec, but I'm pretty sure it is...

> >0002 now makes sense on its own and doesn't change anything around the
> >      interrupt handling. Oh, and it compiles without 0003.
> 
> WaitLatchOrSocket() can throw an error, so it's not totally safe to call
> that underneath OpenSSL.

Hm. Fair point.

> Admittedly the cases where it throws an error are
> "shouldn't happen" cases like "poll() failed" or "read() on self-pipe
> failed", but still. Perhaps those errors should be reclassified as FATAL;
> it's not clear you can just roll back and expect to continue running if any
> of them happens.

Fine with me.

> In secure_raw_write(), you need to save and restore errno, as
> WaitLatchOrSocket will not preserve it. If secure_raw_write() calls
> WaitLatchOrSocket(), and it returns because the latch was set, and we fall
> out of secure_raw_write, we will return -1 but the errno might not be set to
> anything sensible anymore.

Oops.

> >0003 Sinval/notify processing got simplified further. There really isn't
> >      any need for DisableNotifyInterrupt/DisableCatchupInterrupt
> >      anymore. Also begin_client_read/client_read_ended don't make much
> >      sense anymore. Instead introduce ProcessClientReadInterrupt (which
> >      wants a better name).
> >There's also a very WIP
> >0004 Allows secure_read/write be interrupted when ProcDiePending is
> >      set. All of that happens via the latch mechanism, nothing happens
> >      inside signal handlers. So I do think it's quite an improvement
> >      over what's been discussed in this thread.
> >      But it (and the other approaches) do noticeably increase the
> >      likelihood of clients not getting the error message if the client
> >      isn't actually dead. The likelihood of write() being blocked
> >      *temporarily* due to normal bandwidth constraints is quite high
> >      when you consider COPY FROM and similar. Right now we'll wait till
> >      we can put the error message into the socket afaics.
> >
> >1-3 need some serious comment work, but I think the approach is
> >basically sound. I'm much, much less sure about allowing send() to be
> >interrupted.
> 
> Yeah, 1-3 seem sane.

I think 3 also needs a careful look. Have you looked through it? While
imo much less complex than before, there's some complex interactions in
the touched code. And we have terrible coverage of both catchup
interrupts and notify stuff...

Tom, do you happen to have time to look at that bit?


There's also the concern that using a latch for client communication
increases the number of syscalls for the same work. We should at least
try to quantify that...

> 4 also looks OK to me at a quick glance. It basically
> enables handling the "die" interrupt immediately, if we're blocked in a read
> or write. It won't be handled in the signal handler, but within the
> secure_read/write call anyway.

What are you thinking about the concern that it'll reduce the likelihood
of transferring the error message to the client? I tried to reduce that
by only allowing errors when write() blocks, but that's not an
infrequent event.

Thanks for looking.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Heikki Linnakangas
Date:
On 10/03/2014 05:26 PM, Andres Freund wrote:
> On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
>> On 09/28/2014 01:54 AM, Andres Freund wrote:
>>> 0003 Sinval/notify processing got simplified further. There really isn't
>>>       any need for DisableNotifyInterrupt/DisableCatchupInterrupt
>>>       anymore. Also begin_client_read/client_read_ended don't make much
>>>       sense anymore. Instead introduce ProcessClientReadInterrupt (which
>>>       wants a better name).
>>> There's also a very WIP
>>> 0004 Allows secure_read/write be interrupted when ProcDiePending is
>>>       set. All of that happens via the latch mechanism, nothing happens
>>>       inside signal handlers. So I do think it's quite an improvement
>>>       over what's been discussed in this thread.
>>>       But it (and the other approaches) do noticeably increase the
>>>       likelihood of clients not getting the error message if the client
>>>       isn't actually dead. The likelihood of write() being blocked
>>>       *temporarily* due to normal bandwidth constraints is quite high
>>>       when you consider COPY FROM and similar. Right now we'll wait till
>>>       we can put the error message into the socket afaics.
>>>
>>> 1-3 need some serious comment work, but I think the approach is
>>> basically sound. I'm much, much less sure about allowing send() to be
>>> interrupted.
>>
>> Yeah, 1-3 seem sane.
>
> I think 3 also needs a careful look. Have you looked through it? While
> imo much less complex than before, there's some complex interactions in
> the touched code. And we have terrible coverage of both catchup
> interrupts and notify stuff...

I only looked at the .patch, I didn't apply it, so I didn't look at the 
context much. But I don't see any fundamental problem with it. I would 
like to have a closer look before it's committed, though.

> There's also the concern that using a latch for client communication
> increases the number of syscalls for the same work. We should at least
> try to quantify that...

I'm not too concerned about that, since we only do extra syscalls when 
the socket isn't immediately available for reading/writing, i.e. when we 
have to sleep anyway.

>> 4 also looks OK to me at a quick glance. It basically
>> enables handling the "die" interrupt immediately, if we're blocked in a read
>> or write. It won't be handled in the signal handler, but within the
>> secure_read/write call anyway.
>
> What are you thinking about the concern that it'll reduce the likelihood
> of transferring the error message to the client? I tried to reduce that
> by only allowing errors when write() blocks, but that's not an
> infrequent event.

I'm not too concerned about that either. I mean, it's probably true that 
it reduces the likelihood, but I don't particularly care myself. But if 
we care, we could use a timeout there, so that if we receive a SIGTERM 
while blocked on a send(), we wait for a few seconds to see if we can 
send whatever we were sending, before terminating the backend.


What should we do with this patch in the commitfest? Are you planning to 
clean up and commit these patches?

- Heikki



Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-10-03 18:29:23 +0300, Heikki Linnakangas wrote:
> On 10/03/2014 05:26 PM, Andres Freund wrote:
> >On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
> >>On 09/28/2014 01:54 AM, Andres Freund wrote:
> >>>0003 Sinval/notify processing got simplified further. There really isn't
> >>>      any need for DisableNotifyInterrupt/DisableCatchupInterrupt
> >>>      anymore. Also begin_client_read/client_read_ended don't make much
> >>>      sense anymore. Instead introduce ProcessClientReadInterrupt (which
> >>>      wants a better name).
> >>>There's also a very WIP
> >>>0004 Allows secure_read/write be interrupted when ProcDiePending is
> >>>      set. All of that happens via the latch mechanism, nothing happens
> >>>      inside signal handlers. So I do think it's quite an improvement
> >>>      over what's been discussed in this thread.
> >>>      But it (and the other approaches) do noticeably increase the
> >>>      likelihood of clients not getting the error message if the client
> >>>      isn't actually dead. The likelihood of write() being blocked
> >>>      *temporarily* due to normal bandwidth constraints is quite high
> >>>      when you consider COPY FROM and similar. Right now we'll wait till
> >>>      we can put the error message into the socket afaics.
> >>>
> >>>1-3 need some serious comment work, but I think the approach is
> >>>basically sound. I'm much, much less sure about allowing send() to be
> >>>interrupted.
> >>
> >>Yeah, 1-3 seem sane.
> >
> >I think 3 also needs a careful look. Have you looked through it? While
> >imo much less complex than before, there's some complex interactions in
> >the touched code. And we have terrible coverage of both catchup
> >interrupts and notify stuff...
> 
> I only looked at the .patch, I didn't apply it, so I didn't look at the
> context much. But I don't see any fundamental problem with it. I would like
> to have a closer look before it's committed, though.

I'd appreciate that. I don't want to commit it without a careful review
of another committer.

> >There's also the concern that using a latch for client communication
> >increases the number of syscalls for the same work. We should at least
> >try to quantify that...
> 
> I'm not too concerned about that, since we only do extra syscalls when the
> socket isn't immediately available for reading/writing, i.e. when we have to
> sleep anyway.

Well, kernels actually do some nice optimizations for blocking reads -
at least for local sockets. Like switching to the other process
immediately and such.
I'm not super concerned either, but I think we should try to measure
it. And if we're failing, we probably should try to address these
problems - if possible in the latch code.

> >>4 also looks OK to me at a quick glance. It basically
> >>enables handling the "die" interrupt immediately, if we're blocked in a read
> >>or write. It won't be handled in the signal handler, but within the
> >>secure_read/write call anyway.
> >
> >What are you thinking about the concern that it'll reduce the likelihood
> >of transferring the error message to the client? I tried to reduce that
> >by only allowing errors when write() blocks, but that's not an
> >infrequent event.
> 
> I'm not too concerned about that either. I mean, it's probably true that it
> reduces the likelihood, but I don't particularly care myself. But if we
> care, we could use a timeout there, so that if we receive a SIGTERM while
> blocked on a send(), we wait for a few seconds to see if we can send
> whatever we were sending, before terminating the backend.
> 
> 
> What should we do with this patch in the commitfest?

I think feature should be ddeclare as 'returned with feedback' for this
commitfest. I've done so.

I don't see much of a reason waiting with patch 1 till the next
commitfest. It imo looks uncontroversial and doesn't have any far
reaching consequences.

> Are you planning to clean up and commit these patches?

I plan to do so one by one, yes. If you'd like to pick up any of them,
feel free to do (after telling me, to avoid duplicated efforts). I don't
feel proprietary about any of them. But I guess you have other stuff
you'd like to work on too ;)

I'll try to send out a version with the stuff you mentioned earlier in
the next couple days.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hello, simplly inhibit set retry flag when ProcDiePending in
my_sock_write seems enough.

But it returns with SSL_ERROR_SYSCALL not SSL_ERROR_WANT_WRITE so
I modified the patch 4 as the attached patch.

Finally, the attached patch works as expected with Andres's patch
1-3.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 6fc6903..2288fe2 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -750,7 +750,8 @@ my_sock_write(BIO *h, const char *buf, int size)    BIO_clear_retry_flags(h);    if (res <= 0)
{
-        if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
+        if (!ProcDiePending &&
+            (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN))        {            BIO_set_retry_write(h);
    }
 
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 7b5b30f..ab9e122 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -199,6 +199,7 @@ secure_write(Port *port, void *ptr, size_t len){    ssize_t        n;
+retry:#ifdef USE_SSL    if (port->ssl_in_use)    {
@@ -210,7 +211,26 @@ secure_write(Port *port, void *ptr, size_t len)        n = secure_raw_write(port, ptr, len);    }
-    /* XXX: We likely will want to process some interrupts here */
+    /*
+     * We only want to process the interrupt here if socket writes are
+     * blocking to increase the chance to get an error message to the
+     * client. If we're not blocked there'll soon be a
+     * CHECK_FOR_INTERRUPTS(). But if we're blocked we'll never get out of
+     * that situation if the client has died.
+     */
+    if (ProcDiePending && !port->noblock && n < 0)
+    {
+        /*
+         * We're dying. It's safe (and sane) to handle that now.
+         */
+        CHECK_FOR_INTERRUPTS();
+    }
+
+    /* retry after processing interrupts */
+    if (n < 0 && errno == EINTR)
+    {
+        goto retry;
+    }    return n;}
@@ -236,10 +256,19 @@ wloop:         * don't do anything while (possibly) inside a ssl library.         */        w =
WaitLatchOrSocket(&MyProc->procLatch,
-                              WL_SOCKET_WRITEABLE,
+                              WL_LATCH_SET | WL_SOCKET_WRITEABLE,                              port->sock, 0);
-        if (w & WL_SOCKET_WRITEABLE)
+        if (w & WL_LATCH_SET)
+        {
+            ResetLatch(&MyProc->procLatch);
+            /*
+             * Force a return, so interrupts can be processed when not
+             * (possibly) underneath a ssl library.
+             */
+            errno = EINTR;
+        }
+        else if (w & WL_SOCKET_WRITEABLE)        {            goto wloop;        }
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3a6aa1c..61390aa 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -520,6 +520,13 @@ ProcessClientReadInterrupt(void)        errno = save_errno;    }
+    else if (ProcDiePending)
+    {
+        /*
+         * We're dying. It's safe (and sane) to handle that now.
+         */
+        CHECK_FOR_INTERRUPTS();
+    }}

Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-10-09 14:06:35 +0900, Kyotaro HORIGUCHI wrote:
> Hello, simplly inhibit set retry flag when ProcDiePending in
> my_sock_write seems enough.
> 
> But it returns with SSL_ERROR_SYSCALL not SSL_ERROR_WANT_WRITE so
> I modified the patch 4 as the attached patch.

Why is that necessary? It seems really rather wrong to make
BIO_set_retry_write() dependant on ProcDiePending? Especially as, at
least in my testing, it's not even required because the be_tls_write()
can just check the error properly?

> diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
> index 6fc6903..2288fe2 100644
> --- a/src/backend/libpq/be-secure-openssl.c
> +++ b/src/backend/libpq/be-secure-openssl.c
> @@ -750,7 +750,8 @@ my_sock_write(BIO *h, const char *buf, int size)
>      BIO_clear_retry_flags(h);
>      if (res <= 0)
>      {
> -        if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
> +        if (!ProcDiePending &&
> +            (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN))
>          {
>              BIO_set_retry_write(h);
>          }


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hmm.. Sorry for my stupidity.

> Why is that necessary? It seems really rather wrong to make
> BIO_set_retry_write() dependant on ProcDiePending? Especially as, at
> least in my testing, it's not even required because the be_tls_write()
> can just check the error properly?

I mistook the previous conversation as it doesn't work as
expected. I confirmed that it works fine.

After all, it works as I expected. The parameter for
ProcessClientWriteInterrupt() looks somewhat uneasy but the patch
4 looks fine as a whole. Do you have anything to worry about in
the patch?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Escaping from blocked send() reprised.

From
Heikki Linnakangas
Date:
On 10/03/2014 06:29 PM, Heikki Linnakangas wrote:
> On 10/03/2014 05:26 PM, Andres Freund wrote:
>> On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
>>> On 09/28/2014 01:54 AM, Andres Freund wrote:
>>>> 0003 Sinval/notify processing got simplified further. There really isn't
>>>>        any need for DisableNotifyInterrupt/DisableCatchupInterrupt
>>>>        anymore. Also begin_client_read/client_read_ended don't make much
>>>>        sense anymore. Instead introduce ProcessClientReadInterrupt (which
>>>>        wants a better name).
>>>> There's also a very WIP
>>>> 0004 Allows secure_read/write be interrupted when ProcDiePending is
>>>>        set. All of that happens via the latch mechanism, nothing happens
>>>>        inside signal handlers. So I do think it's quite an improvement
>>>>        over what's been discussed in this thread.
>>>>        But it (and the other approaches) do noticeably increase the
>>>>        likelihood of clients not getting the error message if the client
>>>>        isn't actually dead. The likelihood of write() being blocked
>>>>        *temporarily* due to normal bandwidth constraints is quite high
>>>>        when you consider COPY FROM and similar. Right now we'll wait till
>>>>        we can put the error message into the socket afaics.
>>>>
>>>> 1-3 need some serious comment work, but I think the approach is
>>>> basically sound. I'm much, much less sure about allowing send() to be
>>>> interrupted.
>>>
>>> Yeah, 1-3 seem sane.
>>
>> I think 3 also needs a careful look. Have you looked through it? While
>> imo much less complex than before, there's some complex interactions in
>> the touched code. And we have terrible coverage of both catchup
>> interrupts and notify stuff...
>
> I only looked at the .patch, I didn't apply it, so I didn't look at the
> context much. But I don't see any fundamental problem with it. I would
> like to have a closer look before it's committed, though.

About patch 3:

Looking closer, this design still looks OK to me. As you said yourself, 
the comments need some work (e.g. the step 5. in the top comment in 
async.c needs updating). And then there are a couple of FIXME and XXX 
comments that need to be addressed.

The comment on PGPROC.procLatch in storage/proc.h says just this:

>     Latch        procLatch;        /* generic latch for process */

This needs a lot more explaining. It's now used by signal handlers to 
interrupt a read or write to the socket; that should be mentioned. What 
else is it used for? (for waking up a backend in synchronous 
replication, at least) What are the rules on when to arm it and when to 
reset it?

Would it be more clear to use a separate, backend-private, latch, for 
the signals? I guess that won't work, though, because sometimes we need 
need to wait for a wakeup from a different process or from a signal at 
the same time (SyncRepWaitForLSN() in particular). Not without adding a 
variant of WaitLatch that can wait on two latches simultaneously, anyway.

The assumption in secure_raw_read that MyProc exists is pretty 
surprising. I understand why it's that way, and there's a comment in 
PostgresMain explaining why the socket cannot be put into non-blocking 
mode earlier, but it's still a bit whacky. Not sure what to do about that.

- Heikki




Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-10-30 15:27:13 +0200, Heikki Linnakangas wrote:
> The comment on PGPROC.procLatch in storage/proc.h says just this:
> 
> >    Latch        procLatch;        /* generic latch for process */
> 
> This needs a lot more explaining. It's now used by signal handlers to
> interrupt a read or write to the socket; that should be mentioned. What else
> is it used for? (for waking up a backend in synchronous replication, at
> least) What are the rules on when to arm it and when to reset it?

Hm. I agree it use expaned commentary, but I'm unsure if that much
detail is really warranted. Any such documentation seems to be almost
guaranteed to be out of date. As evidenced that there's none to date...

> Would it be more clear to use a separate, backend-private, latch, for the
> signals? I guess that won't work, though, because sometimes we need need to
> wait for a wakeup from a different process or from a signal at the same time
> (SyncRepWaitForLSN() in particular). Not without adding a variant of
> WaitLatch that can wait on two latches simultaneously, anyway.

I wondered the same, but I don't really see what it'd buy us during
normal running. It seems like it'd just make code more complex without
leading to relevantly fewer wakeups.

> The assumption in secure_raw_read that MyProc exists is pretty surprising. I
> understand why it's that way, and there's a comment in PostgresMain
> explaining why the socket cannot be put into non-blocking mode earlier, but
> it's still a bit whacky. Not sure what to do about that.

It makes me quite unhappy too. I looked what it'd take to make proclatch
available earlier, but it wasn't pretty. I wondered whether we could use
a 'early proc latch' in MyProcLatch that used until we're fully attached
to shared memory. Then, when attaching to shared memory we'd set the old
latch once, and reset MyProcLatch to the shared memory one.

But that's pretty ugly.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Alex Shulgin
Date:
Andres Freund <andres@2ndquadrant.com> writes:
>
> I've invested some more time in this:
> 0002 now makes sense on its own and doesn't change anything around the
>      interrupt handling. Oh, and it compiles without 0003.

In this patch, the endif appears to be misplaced in PostgresMain:

+    if (MyProcPort != NULL)
+    {
+#ifdef WIN32
+        pgwin32_noblock = true;
+#else
+        if (!pg_set_noblock(MyProcPort->sock))
+            ereport(COMMERROR,
+                    (errmsg("could not set socket to nonblocking mode: %m")));
+    }
+#endif
+    pqinitmask();

> 0003 Sinval/notify processing got simplified further. There really isn't
>      any need for DisableNotifyInterrupt/DisableCatchupInterrupt
>      anymore. Also begin_client_read/client_read_ended don't make much
>      sense anymore. Instead introduce ProcessClientReadInterrupt (which
>      wants a better name).
> There's also a very WIP
> 0004 Allows secure_read/write be interrupted when ProcDiePending is
>      set. All of that happens via the latch mechanism, nothing happens
>      inside signal handlers. So I do think it's quite an improvement
>      over what's been discussed in this thread.
>      But it (and the other approaches) do noticeably increase the
>      likelihood of clients not getting the error message if the client
>      isn't actually dead. The likelihood of write() being blocked
>      *temporarily* due to normal bandwidth constraints is quite high
>      when you consider COPY FROM and similar. Right now we'll wait till
>      we can put the error message into the socket afaics.
>
> 1-3 need some serious comment work, but I think the approach is
> basically sound. I'm much, much less sure about allowing send() to be
> interrupted.

After re-reading these I don't see the rest of items I wanted to inqury
about anymore, so it just makes more sense now.

One thing I did try is sending a NOTICE to the client when in
ProcessInterrupts() and DoingCommandRead is true.  I think[1] it was
expected to be delivered instantly, but actually the client (psql) only
displays it after sending the next statement.

While I'm reading on FE/BE protocol someone might want to share his
wisdom on this subject.  My guess: psql blocks on readline/libedit call
and can't effectively poll the server socket before complete input from
user.

--
Alex

[1] http://www.postgresql.org/message-id/1262173040.19367.5015.camel@ebony
 ``AFAIK, NOTICE was suggested because it can be sent at any time,   whereas ERRORs are only associated with
statements.''



Re: Escaping from blocked send() reprised.

From
Michael Paquier
Date:
On Thu, Oct 30, 2014 at 10:27 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 10/03/2014 06:29 PM, Heikki Linnakangas wrote:
>> [blah]
> About patch 3:
> Looking closer, this design still looks OK to me. As you said yourself, the
> comments need some work (e.g. the step 5. in the top comment in async.c
> needs updating). And then there are a couple of FIXME and XXX comments that
> need to be addressed.
Those patches have not been updated in a while, and I am seeing some
feedback from several people, hence returning as returned with
feedback. Horiguchi-san, feel free to add new entries on the CF app in
2014-12 or move this entry if you feel overwise.
Regards,
-- 
Michael



Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Since I don't have clear idea how to promote this, I will remake
and be back with new patch based on Andres' for patches.

> Hmm.. Sorry for my stupidity.
> 
> > Why is that necessary? It seems really rather wrong to make
> > BIO_set_retry_write() dependant on ProcDiePending? Especially as, at
> > least in my testing, it's not even required because the be_tls_write()
> > can just check the error properly?
> 
> I mistook the previous conversation as it doesn't work as
> expected. I confirmed that it works fine.
> 
> After all, it works as I expected. The parameter for
> ProcessClientWriteInterrupt() looks somewhat uneasy but the patch
> 4 looks fine as a whole. Do you have anything to worry about in
> the patch?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-12-15 18:19:26 +0900, Kyotaro HORIGUCHI wrote:
> Since I don't have clear idea how to promote this, I will remake
> and be back with new patch based on Andres' for patches.

Do my patches miss any functionality you want?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Kyotaro HORIGUCHI
Date:
Hello,

> On 2014-12-15 18:19:26 +0900, Kyotaro HORIGUCHI wrote:
> > Since I don't have clear idea how to promote this, I will remake
> > and be back with new patch based on Andres' for patches.
> 
> Do my patches miss any functionality you want?

The patch satisfies what I want, as I said upthread. What I don't
know is how I can go on with it in this CF topic, in other word
what should I do in order to put it to "ready for committer"?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-10-03 16:26:35 +0200, Andres Freund wrote:
> On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
> > >0002 now makes sense on its own and doesn't change anything around the
> > >      interrupt handling. Oh, and it compiles without 0003.
> > 
> > WaitLatchOrSocket() can throw an error, so it's not totally safe to call
> > that underneath OpenSSL.
> 
> Hm. Fair point.

I think we should fix this by simply prohibiting
WaitLatch/WaitLatchOrSocket from ERRORing out. The easiest, and imo
acceptable, thing is to simply convert the relevant ERRORs to FATAL. I
think that'd be perfectly fine as it seems very unlikely that we
continue sanely afterwards.

It would really be nice if we had a simple way to raise a FATAL that
won't go to the client for situations like this. I'd proposed
elog(FATAL | COMERROR, ...) in the past...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-11-17 18:22:54 +0300, Alex Shulgin wrote:
> 
> Andres Freund <andres@2ndquadrant.com> writes:
> >
> > I've invested some more time in this:
> > 0002 now makes sense on its own and doesn't change anything around the
> >      interrupt handling. Oh, and it compiles without 0003.
> 
> In this patch, the endif appears to be misplaced in PostgresMain:
> 
> +    if (MyProcPort != NULL)
> +    {
> +#ifdef WIN32
> +        pgwin32_noblock = true;
> +#else
> +        if (!pg_set_noblock(MyProcPort->sock))
> +            ereport(COMMERROR,
> +                    (errmsg("could not set socket to nonblocking mode: %m")));
> +    }
> +#endif
> +

Uh. Odd. Anyway, that bit of code is now somewhere else anyway...

> One thing I did try is sending a NOTICE to the client when in
> ProcessInterrupts() and DoingCommandRead is true.  I think[1] it was
> expected to be delivered instantly, but actually the client (psql) only
> displays it after sending the next statement.

Yea, that should be psql specific though. I hope ;)

> While I'm reading on FE/BE protocol someone might want to share his
> wisdom on this subject.  My guess: psql blocks on readline/libedit call
> and can't effectively poll the server socket before complete input from
> user.

I'm not sure if it's actually a "can't". It doesn't at least ;)

Greetings,

Andres Freund



Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-09-28 00:54:21 +0200, Andres Freund wrote:
> I've invested some more time in this:

And yet round of time spent.

The major change since last round is that I've introduced a local latch
that exists in every process. If InitProcess() is run, that latch is
replaced by the shared process latch and the reverse happens in
ProcKill.  To implement this, I decided to remove some code duplication
around process initialization.

The major reason to do is that this allows us to get rid of the special
case where MyProc isn't available yet during early process startup,
where the socket still had to be in blocking mode. Instead we can now
rely on the latch all the time.

Other than that I've significantly cleaned up and tested the
patchset. Unfortunately that testing has brought a significant number of
SSL bugs to light, but they all turned out to be independent of these
changes. I'll try to detail them in a separate email early next week.

I've done some performance measurements, to verify this doesn't cause a
regression. When testing with 'SELECT 1' as a trivial statement I
couldn't measure any regression at 32 pgbench clients/threads on my 4
core (i7-4800MQ) laptop. At 390 clients I saw about 1.6%. With statement
that actually does something that difference vanished.  Personally I'm
ok with that.

The patches are:

0001-Allow-latches-to-wait-for-socket-writability-without.patch
     Imo pretty close to commit and can be committed independently.

0002-Commonalize-process-startup-code.patch
     The above mentioned deduplication. Needs a review (completely new),
     but it's imo a clear improvement and allows for a fair amount of
     future/further deduplication.

0003-Add-a-default-local-latch-for-use-in-signal-handlers.patch
     Adds the default latch + the logic to switch to shared latch while
     attached. I think it's a good idea, but I'd like some feedback.

0004-Use-a-nonblocking-socket-for-FE-BE-communication-and.patch
     The earlier patch that converts the communication to use latches
     with some added cleanup/improvements. Specifically we don't have to
     rely on MyProc->procLatch anymore due to 0003 which gets rid of
     some ugly code in be-secure.c and postgres.c

     I think this patch might not be safe without 0005 because I can't
     convince myself that it's safe to interrupt latch waits with work
     that actually might also use the same latch (sinval
     interrupts). But it's easier to review this way.

0005-Introduce-and-use-infrastructure-for-interrupt-proce.patch
     Actually move most of sinval.c/async.c's interrupt handling out of
     the signal handlers and use the latches. This is the cleaned up
     version of the earlier commit.

0006-WIP-Process-die-interrupts-while-reading-writing-fro.patch
     Not much has changed, needs at least some comments. I want to get
     the other stuff done first.

There remains one 'FIXME' after all the patches, which is
interactive_getc() won't react to catchup/notify interrupts - which, as
far as I can see, is fine, as there are none.

Comments?

Greetings,

Andres Freund

Attachment

Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2014-09-04 08:49:22 -0400, Robert Haas wrote:
> On Tue, Sep 2, 2014 at 3:01 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > I'm slightly worried about the added overhead due to the latch code. In
> > my implementation I only use latches after a nonblocking read, but
> > still. Every WaitLatchOrSocket() does a drainSelfPipe(). I wonder if
> > that can be made problematic.
> 
> I think that's not the word you're looking for.

There's a "less" missing...

> At some point I hacked up a very crude prototype that made LWLocks use
> latches to sleep instead of semaphores.  It was slow.

Interesting. I dimly remembered you mentioning this, that's how I
rediscovered this message.

Do you remember any details?

My guess that's not so much the overhead of the latch itself, but the
lack of the directed wakeup stuff the OS provides for semaphores.


If we could replace all usages of semaphores that set immediate
interrupts to ok, we could quite easily make the deadlock detector
et. al. run outside of signal handlers. That would imo make it more
robust, and easier to understand - right now the correctness of locking
done in the deadlock detector isn't obvious.  With the infrastructure in
place it'd also allow your new parallelism code to run outside of signal
handlers.

Unfortunately currently sempahores can't be unlocked in a signal handler
(as sysv semaphores aren't signal safe)... It'd also be not so nice to
set both a latch and semaphores in every signal handler.


> AIUI, the only reason why we need the self-pipe thing is because on
> some platforms signals don't interrupt system calls.  But my
> impression was that those platforms were somewhat obscure.

To the contrary, I think it's only very obscure platforms where signals
still interrupt syscalls - we set SA_RESTART for pretty much
everything. There's a couple of system calls that ignore SA_RESTART. For
some that's defined in posix, for others it's operating system
specific. E.g. on linux semop(), poll(), select() are defined to always
return EINTR when interrupted.

Anyway, the discussion since cleared up that we need the self byte to
handle a race, anyway.

> Basically, it doesn't feel like a good thing that we've got two sets
> of primitives for making a backend wait that (1) don't really know
> about each other and (2) use different operating system primitives.
> Presumably one of the two systems is better; let's figure out which
> one it is, use that one all the time, and get rid of the other one.

I think the latch interface is clearly better for what we use
sema/latches for as it allows to wait for signals (latch sets), a socket
and timeouts. So let's try to figure out how to make it perform
comparably or better than semaphores.

There's imo only one semaphore user that can't trivially be replaced by
latches: the semaphore spinlock emulation. Both proc.c and and lwlock.c
can be converted quite easily - in the latter case, it might actually
end up saving some code.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Noah Misch
Date:
On Sat, Jan 10, 2015 at 03:25:42AM +0100, Andres Freund wrote:
> 0001-Allow-latches-to-wait-for-socket-writability-without.patch
>      Imo pretty close to commit and can be committed independently.

The key open question is whether all platforms of interest can reliably detect
end-of-file when poll()ing or select()ing for write only.  Older GNU/Linux
select() cannot; see attached test program.  We use poll() there anyway, so
the bug in that configuration does not affect PostgreSQL.  Is it a bellwether
of similar bugs in other implementations, bugs that will affect PostgreSQL?

> This previously had explicitly been forbidden in e42a21b9e6c9, as
> there was no use case at that point. We now are looking into making
> FE/BE communication use latches, so it

Truncated sentence.

> +            if (pfds[0].revents & (POLLHUP | POLLERR | POLLNVAL))
> +            {
> +                /* EOF/error condition */
> +                if (wakeEvents & WL_SOCKET_READABLE)
> +                    result |= WL_SOCKET_READABLE;
> +                if (wakeEvents & WL_SOCKET_WRITEABLE)
> +                    result |= WL_SOCKET_WRITEABLE;
> +            }

With some poll() implementations (e.g. OS X), this can wrongly report
WL_SOCKET_WRITEABLE if the peer used shutdown(SHUT_WR).  I tentatively think
that's acceptable.  libpq does not use shutdown(), and other client interfaces
would do so at their own risk.  Should we worry about hostile clients creating
a denial-of-service by causing a server send() to block unexpectedly?
Probably not; a user able to send arbitrary TCP traffic to the postmaster port
can already achieve that.

> +            if (resEvents.lNetworkEvents & FD_CLOSE)
> +            {
> +                if (wakeEvents & WL_SOCKET_READABLE)
> +                    result |= WL_SOCKET_READABLE;
> +                if (wakeEvents & WL_SOCKET_WRITEABLE)
> +                    result |= WL_SOCKET_WRITEABLE;
> +            }
> +
>          }

Extra blank line.

Attachment

Re: Escaping from blocked send() reprised.

From
Robert Haas
Date:
On Sat, Jan 10, 2015 at 11:35 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Interesting. I dimly remembered you mentioning this, that's how I
> rediscovered this message.
>
> Do you remember any details?

No, not really.

> My guess that's not so much the overhead of the latch itself, but the
> lack of the directed wakeup stuff the OS provides for semaphores.

That's possible.

> If we could replace all usages of semaphores that set immediate
> interrupts to ok, we could quite easily make the deadlock detector
> et. al. run outside of signal handlers. That would imo make it more
> robust, and easier to understand - right now the correctness of locking
> done in the deadlock detector isn't obvious.  With the infrastructure in
> place it'd also allow your new parallelism code to run outside of signal
> handlers.

Yes, I would be very happy to see ImmediateInterruptOK die in a fire.

> Unfortunately currently sempahores can't be unlocked in a signal handler
> (as sysv semaphores aren't signal safe)... It'd also be not so nice to
> set both a latch and semaphores in every signal handler.

Agreed.

>> AIUI, the only reason why we need the self-pipe thing is because on
>> some platforms signals don't interrupt system calls.  But my
>> impression was that those platforms were somewhat obscure.
>
> To the contrary, I think it's only very obscure platforms where signals
> still interrupt syscalls - we set SA_RESTART for pretty much
> everything. There's a couple of system calls that ignore SA_RESTART. For
> some that's defined in posix, for others it's operating system
> specific. E.g. on linux semop(), poll(), select() are defined to always
> return EINTR when interrupted.

The recent problems with test_shm_mq test failing on anole were caused
by the fact that a signal doesn't abort select() on that platform, but
does reset the timer.  So a steady stream of signals results in never
reaching the timeout.

> Anyway, the discussion since cleared up that we need the self byte to
> handle a race, anyway.

Eh?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2015-01-11 16:36:07 -0500, Noah Misch wrote:
> On Sat, Jan 10, 2015 at 03:25:42AM +0100, Andres Freund wrote:
> > 0001-Allow-latches-to-wait-for-socket-writability-without.patch
> >      Imo pretty close to commit and can be committed independently.
> 
> The key open question is whether all platforms of interest can reliably detect
> end-of-file when poll()ing or select()ing for write only.  Older GNU/Linux
> select() cannot; see attached test program.

Yuck. By my reading that's a violation of posix.

I did test it a bit, and I didn't see problems, but that obviously
doesn't say much about old versions.

Afaics we interestingly don't have any poll-less buildfarm animals that
use unix_latch.c...

> We use poll() there anyway, so the bug in that configuration does not
> affect PostgreSQL.  Is it a bellwether of similar bugs in other
> implementations, bugs that will affect PostgreSQL?

Hm. I can think of two stopgap measures we could add:

1) If we're using select() and WL_SOCKET_WRITEABLE is set without  _READABLE, add a timeout of Min(1s,
Max(passed_timeout,1s)). As the  time spent waiting only for writable normally shouldn't be very long,  that shouldn't
benoticeably bad for power usage.
 
2) Add a SIGPIPE handler that just does a SetLatch(MyLach).

> > This previously had explicitly been forbidden in e42a21b9e6c9, as
> > there was no use case at that point. We now are looking into making
> > FE/BE communication use latches, so it
> 
> Truncated sentence.

Fixed in what I've since pushed (as Heikki basically was ok with the
patch sent a couple months back, modulo some fixes)...

> > +            if (pfds[0].revents & (POLLHUP | POLLERR | POLLNVAL))
> > +            {
> > +                /* EOF/error condition */
> > +                if (wakeEvents & WL_SOCKET_READABLE)
> > +                    result |= WL_SOCKET_READABLE;
> > +                if (wakeEvents & WL_SOCKET_WRITEABLE)
> > +                    result |= WL_SOCKET_WRITEABLE;
> > +            }
> 
> With some poll() implementations (e.g. OS X), this can wrongly report
> WL_SOCKET_WRITEABLE if the peer used shutdown(SHUT_WR).  I tentatively think
> that's acceptable.  libpq does not use shutdown(), and other client interfaces
> would do so at their own risk.  Should we worry about hostile clients creating
> a denial-of-service by causing a server send() to block unexpectedly?
> Probably not; a user able to send arbitrary TCP traffic to the postmaster port
> can already achieve that.

Yea, this doesn't seem particularly concerning.

a) They can just stop consuming writes and use noticeable amounts of
memory by doing output intensive queries. That uses significant os
resources and is much harder to detect - today.

b) does accepting WL_SOCKET_WRITEABLE without _READABLE change anything
here? We already allow _WRITABLE... What happens if you write/send() in
that state, btw?

Greetings,

Andres Freund



Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2015-01-11 16:47:53 -0500, Robert Haas wrote:
> > My guess that's not so much the overhead of the latch itself, but the
> > lack of the directed wakeup stuff the OS provides for semaphores.
> 
> That's possible.

> On Sat, Jan 10, 2015 at 11:35 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > Interesting. I dimly remembered you mentioning this, that's how I
> > rediscovered this message.
> >
> > Do you remember any details?
> 
> No, not really.

I've done a hackish conversion of proc.c to use semaphores and in a
sleeping heavy workload (pgbench -j 390 against a scale 1 db) there
originally was about a %5 regression (fluctuating a fair bit).

I've played with hacking unix_latch.c to

a) use eventfd() for the local latch and (no performance difference  anymore)

b) Adding a eventfd to struct Latch for latches that are created from
postmaster. That fd can directly be written to by other processes
(combined slightly faster than semaphores).

Not sure how much b) is acceptable, due to using MaxBackend fds. And
both only help linux.

> > If we could replace all usages of semaphores that set immediate
> > interrupts to ok, we could quite easily make the deadlock detector
> > et. al. run outside of signal handlers. That would imo make it more
> > robust, and easier to understand - right now the correctness of locking
> > done in the deadlock detector isn't obvious.  With the infrastructure in
> > place it'd also allow your new parallelism code to run outside of signal
> > handlers.
> 
> Yes, I would be very happy to see ImmediateInterruptOK die in a fire.

Yea, I think it's a absolutely horrible idea.

> > Unfortunately currently sempahores can't be unlocked in a signal handler
> > (as sysv semaphores aren't signal safe)... It'd also be not so nice to
> > set both a latch and semaphores in every signal handler.
> 
> Agreed.

I've since (re-)realised that we've actually relied on semaphore being
signal safe for years. The PGSemaphoreLock() in proc.c allows
interrupts. And the deadlock detector then uses lwlocks, which in turn
use semaphores.  That sucks.

> > Anyway, the discussion since cleared up that we need the self byte to
> > handle a race, anyway.
> 
> Eh?

I'm referring to the point Heikki has made in his second paragraph in
5408638C.1080308@vmware.com .


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Noah Misch
Date:
On Mon, Jan 12, 2015 at 12:40:50AM +0100, Andres Freund wrote:
> On 2015-01-11 16:36:07 -0500, Noah Misch wrote:
> > On Sat, Jan 10, 2015 at 03:25:42AM +0100, Andres Freund wrote:
> > > 0001-Allow-latches-to-wait-for-socket-writability-without.patch
> > >      Imo pretty close to commit and can be committed independently.
> > 
> > The key open question is whether all platforms of interest can reliably detect
> > end-of-file when poll()ing or select()ing for write only.  Older GNU/Linux
> > select() cannot; see attached test program.
> 
> Yuck. By my reading that's a violation of posix.

Agreed.

> I did test it a bit, and I didn't see problems, but that obviously
> doesn't say much about old versions.
> 
> Afaics we interestingly don't have any poll-less buildfarm animals that
> use unix_latch.c...

More likely is that some system will have a poll() exhibiting the same bug,
possibly via poll() being a wrapper around select().  Systems without poll()
are hard to find; the gnulib manual lists only Windows, BeOS and HP NonStop
OS.  HP NonStop OS is the one possibly of interest here.  I have never
personally seen a machine running it.

I recommend either (a) taking no action or (b) adding a regression test
verifying WaitLatchOrSocket() conformance in this scenario.  Then we can
decide what more to do if failure evidence emerges.

> > We use poll() there anyway, so the bug in that configuration does not
> > affect PostgreSQL.  Is it a bellwether of similar bugs in other
> > implementations, bugs that will affect PostgreSQL?
> 
> Hm. I can think of two stopgap measures we could add:
> 
> 1) If we're using select() and WL_SOCKET_WRITEABLE is set without
>    _READABLE, add a timeout of Min(1s, Max(passed_timeout, 1s)). As the
>    time spent waiting only for writable normally shouldn't be very long,
>    that shouldn't be noticeably bad for power usage.
> 2) Add a SIGPIPE handler that just does a SetLatch(MyLach).

I'm having trouble visualizing those proposed measures in detail, but I trust
that a decent workaround would emerge.

> > > +            if (pfds[0].revents & (POLLHUP | POLLERR | POLLNVAL))
> > > +            {
> > > +                /* EOF/error condition */
> > > +                if (wakeEvents & WL_SOCKET_READABLE)
> > > +                    result |= WL_SOCKET_READABLE;
> > > +                if (wakeEvents & WL_SOCKET_WRITEABLE)
> > > +                    result |= WL_SOCKET_WRITEABLE;
> > > +            }
> > 
> > With some poll() implementations (e.g. OS X), this can wrongly report
> > WL_SOCKET_WRITEABLE if the peer used shutdown(SHUT_WR).  I tentatively think
> > that's acceptable.  libpq does not use shutdown(), and other client interfaces
> > would do so at their own risk.  Should we worry about hostile clients creating
> > a denial-of-service by causing a server send() to block unexpectedly?
> > Probably not; a user able to send arbitrary TCP traffic to the postmaster port
> > can already achieve that.
> 
> Yea, this doesn't seem particularly concerning.
> 
> a) They can just stop consuming writes and use noticeable amounts of
> memory by doing output intensive queries. That uses significant os
> resources and is much harder to detect - today.

If there's anything to worry about here (unlikely), it would be with respect
to not-yet-authenticated connections only.

> b) does accepting WL_SOCKET_WRITEABLE without _READABLE change anything
> here? We already allow _WRITABLE...

Today's code translates POLLHUP to WL_SOCKET_READABLE; it must see POLLOUT to
set WL_SOCKET_WRITEABLE.  Your patch changes that.

> What happens if you write/send() in
> that state, btw?

write() reports EAGAIN.



Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2015-01-11 19:37:53 -0500, Noah Misch wrote:
> I recommend either (a) taking no action or (b) adding a regression test
> verifying WaitLatchOrSocket() conformance in this scenario.

Do you have a good idea how to test b) save a C function in regress.c
that does what your test does using latches?

> Then we cane decide what more to do if failure evidence emerges.

Seems fine to me.

> > Hm. I can think of two stopgap measures we could add:
> > 
> > 1) If we're using select() and WL_SOCKET_WRITEABLE is set without
> >    _READABLE, add a timeout of Min(1s, Max(passed_timeout, 1s)). As the
> >    time spent waiting only for writable normally shouldn't be very long,
> >    that shouldn't be noticeably bad for power usage.
> > 2) Add a SIGPIPE handler that just does a SetLatch(MyLach).
> 
> I'm having trouble visualizing those proposed measures in detail, but I trust
> that a decent workaround would emerge.

For 1) I'm thinking of just regularly causing a spurious
WL_SOCKET_WRITEABLE event via timeouts if it's the only parameter. Latch
users have to deal with spurious wakeups anyway, so that should be
mostly unproblematic.

For 2) I was thinking that for now the problem only arises for the main
FE/BE socket. So we can install a sigpipe handler that does a SetLatch()
- that will trigger WaitLatch() to return and, after checking for
interrupts, retry the actual send() - which then'd return ECONNRESET.

> > What happens if you write/send() in
> > that state, btw?
> 
> write() reports EAGAIN.

Grand.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Escaping from blocked send() reprised.

From
Noah Misch
Date:
On Mon, Jan 12, 2015 at 01:45:41AM +0100, Andres Freund wrote:
> On 2015-01-11 19:37:53 -0500, Noah Misch wrote:
> > I recommend either (a) taking no action or (b) adding a regression test
> > verifying WaitLatchOrSocket() conformance in this scenario.
> 
> Do you have a good idea how to test b) save a C function in regress.c
> that does what your test does using latches?

No, that's what I had in mind.  You could probably achieve it with a libpq
program that lets input accumulate, but that's trickier.



Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
On 2015-01-12 00:40:50 +0100, Andres Freund wrote:
> Fixed in what I've since pushed (as Heikki basically was ok with the
> patch sent a couple months back, modulo some fixes)...

I'd not actually pushed that patch... I had pushed some patches
(barriers, atomics), but had decided to hold off on this. I've now done
so.

I've mentioned the portability concerns over select() bugs in the commit
message & a comment. ATM I'm not inclined to add a relatively elaborate
test for the bug on pretty fringe platforms.

Thanks for looking at this!

I plan to continue with committing
1) Commonalize process startup code
2) Add a default local latch for use in signal handlers
3) Use a nonblocking socket for FE/BE communication and block using latches

pretty soon.

As we already seem to assume that WaitLatch() is signal safe/reentrant
(c.f. walsender.c), I'm fine with committing 3) in isolation, without
4). I need a test that properly exercises catchup interrupts before
committing that.

Once I have that test I plan to commit
4) Introduce and use infrastructure for interrupt processing during
client reads.

I'd like some input from others what they think about the problem that
5) "Process 'die' interrupts while reading/writing from a socket."

can reduce the likelihood of clients getting the error message. I
personally think that's more than outweighed by not having backends
stuck (save quickdie) for a long time when the client is gone/stuck. I
think the middleground in the patch to only process die events when
actually blocked in writes reduces the likelihood of this sufficiently.

I have hacks ontop this to get rid of ImmediateInterrupt alltogether,
although I'm not sure how well this will work for some parts of
auth/crypt.c. Everything else, including the deadlock checker, seems
quite doable.

Andres



Re: Escaping from blocked send() reprised.

From
Andres Freund
Date:
Hi Heikki,

On 2014-09-02 21:22:29 +0300, Heikki Linnakangas wrote:
> On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
> >   To make the code mentioned above (Patch 0002) tidy, rewrite the
> >   socket emulation code for win32 backends so that each socket
> >   can have its own non-blocking state. (patch 0001)
> 
> The first patch that makes non-blocking sockets behave more sanely on
> Windows seems like a good idea, independently of the second patch. I'm
> looking at the first patch now, I'll make a separate post about the second
> patch.

> On Windows, the backend has an emulation layer for POSIX signals, which uses
> threads and Windows events. The reason win32/socket.c always uses
> non-blocking mode internally is that it needs to wait for the socket to
> become readable/writeable, and for the signal-emulation event, at the same
> time. So no, we can't remove it.
> 
> The approach taken in the first patch seems sensible. I changed it to not
> use FD_SET, though. A custom array seems better, that way we don't need the
> pgwin32_nonblockset_init() call, we can just use initialize the variable.
> It's a little bit more code, but it's well-contained in win32/socket.c.
> Please take a look, to double-check that I didn't screw up.

Heikki, what's your plan about this patch? Do you plan to commit it?

Greetings,

Andres Freund