Thread: Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

Peter Eisentraut <peter_e@gmx.net> writes:
> Use asynchronous connect API in libpqwalreceiver

Buildfarm member bowerbird has been failing in the pg_rewind test since
this patch went in.  It looks like it's failing to complete connections
from the standby; which suggests that something platform-specific is
missing from this commit, but I dunno what.
        regards, tom lane



Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API inlibpqwalreceiver

From
Peter Eisentraut
Date:
On 3/3/17 19:16, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> Use asynchronous connect API in libpqwalreceiver
> 
> Buildfarm member bowerbird has been failing in the pg_rewind test since
> this patch went in.  It looks like it's failing to complete connections
> from the standby; which suggests that something platform-specific is
> missing from this commit, but I dunno what.

Hmm, I wonder how widely tested the async connection API is on Windows
at all.  I only see bowerbird and jacana running bin-check on Windows.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 3/3/17 19:16, Tom Lane wrote:
>> Peter Eisentraut <peter_e@gmx.net> writes:
>>> Use asynchronous connect API in libpqwalreceiver

>> Buildfarm member bowerbird has been failing in the pg_rewind test since
>> this patch went in.  It looks like it's failing to complete connections
>> from the standby; which suggests that something platform-specific is
>> missing from this commit, but I dunno what.

> Hmm, I wonder how widely tested the async connection API is on Windows
> at all.  I only see bowerbird and jacana running bin-check on Windows.

Yeah, I was wondering if this is just exposing a pre-existing bug.
However, the "normal" path operates by repeatedly invoking PQconnectPoll
(cf. connectDBComplete) so it's not immediately obvious how such a bug
would've escaped detection.
        regards, tom lane



On 04/03/17 05:11, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> On 3/3/17 19:16, Tom Lane wrote:
>>> Peter Eisentraut <peter_e@gmx.net> writes:
>>>> Use asynchronous connect API in libpqwalreceiver
> 
>>> Buildfarm member bowerbird has been failing in the pg_rewind test since
>>> this patch went in.  It looks like it's failing to complete connections
>>> from the standby; which suggests that something platform-specific is
>>> missing from this commit, but I dunno what.
> 
>> Hmm, I wonder how widely tested the async connection API is on Windows
>> at all.  I only see bowerbird and jacana running bin-check on Windows.
> 
> Yeah, I was wondering if this is just exposing a pre-existing bug.
> However, the "normal" path operates by repeatedly invoking PQconnectPoll
> (cf. connectDBComplete) so it's not immediately obvious how such a bug
> would've escaped detection.
> 

I can see one difference though (I didn't see this code before) and that
is, the connectDBComplete starts with waiting for socket to become
writable and only then calls PQconnectPoll, while my patch starts with
PQconnectPoll call. And I see following comment in connectDBstart
>     /*
>      * The code for processing CONNECTION_NEEDED state is in PQconnectPoll(),
>      * so that it can easily be re-executed if needed again during the
>      * asynchronous startup process.  However, we must run it once here,
>      * because callers expect a success return from this routine to mean that
>      * we are in PGRES_POLLING_WRITING connection state.
>      */

So I guess I implemented it wrong in a subtle way that breaks on windows.

If that's the case, the attached should fix it, but I have no way of
testing it on windows, I can only say that it still works on my machine
so at least it hopefully does not make things worse.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API inlibpqwalreceiver

From
Peter Eisentraut
Date:
On 3/4/17 01:45, Petr Jelinek wrote:
> I can see one difference though (I didn't see this code before) and that
> is, the connectDBComplete starts with waiting for socket to become
> writable and only then calls PQconnectPoll, while my patch starts with
> PQconnectPoll call. And I see following comment in connectDBstart
>>     /*
>>      * The code for processing CONNECTION_NEEDED state is in PQconnectPoll(),
>>      * so that it can easily be re-executed if needed again during the
>>      * asynchronous startup process.  However, we must run it once here,
>>      * because callers expect a success return from this routine to mean that
>>      * we are in PGRES_POLLING_WRITING connection state.
>>      */

Yes, the libpq documentation also says that.

> If that's the case, the attached should fix it, but I have no way of
> testing it on windows, I can only say that it still works on my machine
> so at least it hopefully does not make things worse.

Committed that.  Let's see how it goes.

I rewrote the while loop as a for loop so that the control logic isn't
spread out over three screens.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API inlibpqwalreceiver

From
Peter Eisentraut
Date:
On 3/6/17 09:38, Peter Eisentraut wrote:
> On 3/4/17 01:45, Petr Jelinek wrote:
>> If that's the case, the attached should fix it, but I have no way of
>> testing it on windows, I can only say that it still works on my machine
>> so at least it hopefully does not make things worse.
> 
> Committed that.  Let's see how it goes.

So that didn't work.  Now we probably need someone to dig into that host
directly.

Andrew, could you help?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




On 03/08/2017 08:33 AM, Peter Eisentraut wrote:
> On 3/6/17 09:38, Peter Eisentraut wrote:
>> On 3/4/17 01:45, Petr Jelinek wrote:
>>> If that's the case, the attached should fix it, but I have no way of
>>> testing it on windows, I can only say that it still works on my machine
>>> so at least it hopefully does not make things worse.
>> Committed that.  Let's see how it goes.
> So that didn't work.  Now we probably need someone to dig into that host
> directly.
>
> Andrew, could you help?
>


I'll take a look when I get a chance. Might be a few days.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





On 03/03/2017 11:11 PM, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> On 3/3/17 19:16, Tom Lane wrote:
>>> Peter Eisentraut <peter_e@gmx.net> writes:
>>>> Use asynchronous connect API in libpqwalreceiver
>>> Buildfarm member bowerbird has been failing in the pg_rewind test since
>>> this patch went in.  It looks like it's failing to complete connections
>>> from the standby; which suggests that something platform-specific is
>>> missing from this commit, but I dunno what.
>> Hmm, I wonder how widely tested the async connection API is on Windows
>> at all.  I only see bowerbird and jacana running bin-check on Windows.
> Yeah, I was wondering if this is just exposing a pre-existing bug.
> However, the "normal" path operates by repeatedly invoking PQconnectPoll
> (cf. connectDBComplete) so it's not immediately obvious how such a bug
> would've escaped detection.
>
>

(After a long period of fruitless empirical testing I turned to the code)


Maybe I'm missing something, but connectDBComplete() handles a return of
PGRESS_POLLING_OK as a success while connectDBStart() seems not to. I
don't find anywhere in our code other than libpqwalreceiver that
actually uses that interface, so it's not surprising if it's now
failing. So my bet is it is indeed a long-standing bug.

cheers

andr4ew

--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 03/03/2017 11:11 PM, Tom Lane wrote:
>> Yeah, I was wondering if this is just exposing a pre-existing bug.
>> However, the "normal" path operates by repeatedly invoking PQconnectPoll
>> (cf. connectDBComplete) so it's not immediately obvious how such a bug
>> would've escaped detection.

> (After a long period of fruitless empirical testing I turned to the code)
> Maybe I'm missing something, but connectDBComplete() handles a return of
> PGRESS_POLLING_OK as a success while connectDBStart() seems not to. I
> don't find anywhere in our code other than libpqwalreceiver that
> actually uses that interface, so it's not surprising if it's now
> failing. So my bet is it is indeed a long-standing bug.

Meh ... that argument doesn't hold water, because the old code here called
PQconnectdbParams which is just PQconnectStartParams then
connectDBComplete.  So the problem cannot be in connectDBStart; that's
common to both paths.  It has to be some discrepancy between what
connectDBComplete does and what the new loop in libpqwalreceiver is doing.

The original loop coding in 1e8a85009 was not very close to the documented
spec for PQconnectPoll at all, and while e434ad39a made it closer, it's
still not really the same: connectDBComplete doesn't call PQconnectPoll
until the socket is known read-ready or write-ready.  The walreceiver loop
does not guarantee that, but would make an additional call after any
random other wakeup.  It's not very clear why bowerbird, and only
bowerbird, would be seeing such wakeups --- but I'm having a really hard
time seeing any other explanation for the change in behavior.  (I wonder
whether bowerbird is telling us that WaitLatchOrSocket can sometimes
return prematurely on Windows.)

I'm also pretty sure that the ResetLatch call is in the wrong place which
could lead to missed wakeups, though that's the opposite of the immediate
problem.

I'll try correcting these things and we'll see if it gets any better.
        regards, tom lane



On 15/03/17 17:55, Tom Lane wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> On 03/03/2017 11:11 PM, Tom Lane wrote:
>>> Yeah, I was wondering if this is just exposing a pre-existing bug.
>>> However, the "normal" path operates by repeatedly invoking PQconnectPoll
>>> (cf. connectDBComplete) so it's not immediately obvious how such a bug
>>> would've escaped detection.
> 
>> (After a long period of fruitless empirical testing I turned to the code)
>> Maybe I'm missing something, but connectDBComplete() handles a return of
>> PGRESS_POLLING_OK as a success while connectDBStart() seems not to. I
>> don't find anywhere in our code other than libpqwalreceiver that
>> actually uses that interface, so it's not surprising if it's now
>> failing. So my bet is it is indeed a long-standing bug.
> 
> Meh ... that argument doesn't hold water, because the old code here called
> PQconnectdbParams which is just PQconnectStartParams then
> connectDBComplete.  So the problem cannot be in connectDBStart; that's
> common to both paths.  It has to be some discrepancy between what
> connectDBComplete does and what the new loop in libpqwalreceiver is doing.
> 
> The original loop coding in 1e8a85009 was not very close to the documented
> spec for PQconnectPoll at all, and while e434ad39a made it closer, it's
> still not really the same: connectDBComplete doesn't call PQconnectPoll
> until the socket is known read-ready or write-ready.  The walreceiver loop
> does not guarantee that, but would make an additional call after any
> random other wakeup.  It's not very clear why bowerbird, and only
> bowerbird, would be seeing such wakeups --- but I'm having a really hard
> time seeing any other explanation for the change in behavior.  (I wonder
> whether bowerbird is telling us that WaitLatchOrSocket can sometimes
> return prematurely on Windows.)
> 
> I'm also pretty sure that the ResetLatch call is in the wrong place which
> could lead to missed wakeups, though that's the opposite of the immediate
> problem.
> 
> I'll try correcting these things and we'll see if it gets any better.
> 

Looks like that didn't help either.

I setup my own Windows machine and can reproduce the issue. I played
around a bit and could not really find a fix other than adding
WL_TIMEOUT and short timeout to WaitLatchOrSocket (it does wait a very
long time on the WaitLatchOrSocket otherwise before failing).

So I wonder if this is the same issue that caused us using different
coding for WaitLatchOrSocket in pgstat.c (lines ~3918-3940).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



On 2017-03-16 13:00:54 +0100, Petr Jelinek wrote:
> On 15/03/17 17:55, Tom Lane wrote:
> > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> >> On 03/03/2017 11:11 PM, Tom Lane wrote:
> >>> Yeah, I was wondering if this is just exposing a pre-existing bug.
> >>> However, the "normal" path operates by repeatedly invoking PQconnectPoll
> >>> (cf. connectDBComplete) so it's not immediately obvious how such a bug
> >>> would've escaped detection.
> > 
> >> (After a long period of fruitless empirical testing I turned to the code)
> >> Maybe I'm missing something, but connectDBComplete() handles a return of
> >> PGRESS_POLLING_OK as a success while connectDBStart() seems not to. I
> >> don't find anywhere in our code other than libpqwalreceiver that
> >> actually uses that interface, so it's not surprising if it's now
> >> failing. So my bet is it is indeed a long-standing bug.
> > 
> > Meh ... that argument doesn't hold water, because the old code here called
> > PQconnectdbParams which is just PQconnectStartParams then
> > connectDBComplete.  So the problem cannot be in connectDBStart; that's
> > common to both paths.  It has to be some discrepancy between what
> > connectDBComplete does and what the new loop in libpqwalreceiver is doing.
> > 
> > The original loop coding in 1e8a85009 was not very close to the documented
> > spec for PQconnectPoll at all, and while e434ad39a made it closer, it's
> > still not really the same: connectDBComplete doesn't call PQconnectPoll
> > until the socket is known read-ready or write-ready.  The walreceiver loop
> > does not guarantee that, but would make an additional call after any
> > random other wakeup.  It's not very clear why bowerbird, and only
> > bowerbird, would be seeing such wakeups --- but I'm having a really hard
> > time seeing any other explanation for the change in behavior.  (I wonder
> > whether bowerbird is telling us that WaitLatchOrSocket can sometimes
> > return prematurely on Windows.)
> > 
> > I'm also pretty sure that the ResetLatch call is in the wrong place which
> > could lead to missed wakeups, though that's the opposite of the immediate
> > problem.
> > 
> > I'll try correcting these things and we'll see if it gets any better.
> > 
> 
> Looks like that didn't help either.
> 
> I setup my own Windows machine and can reproduce the issue. I played
> around a bit and could not really find a fix other than adding
> WL_TIMEOUT and short timeout to WaitLatchOrSocket (it does wait a very
> long time on the WaitLatchOrSocket otherwise before failing).

Hm. Could you use process explorer or such to see the exact events
happening?  Seing that might help us to nail this down.

Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> On 2017-03-16 13:00:54 +0100, Petr Jelinek wrote:
>> Looks like that didn't help either.
>> 
>> I setup my own Windows machine and can reproduce the issue. I played
>> around a bit and could not really find a fix other than adding
>> WL_TIMEOUT and short timeout to WaitLatchOrSocket (it does wait a very
>> long time on the WaitLatchOrSocket otherwise before failing).

> Hm. Could you use process explorer or such to see the exact events
> happening?  Seing that might help us to nail this down.

At this point it's hard to escape the conclusion that we're looking at
some sort of bug in the Windows version of the WaitEventSet infrastructure
--- or at least, the WaitEventSet-based waits for socket events are
evidently behaving differently from the implementation on the libpq side,
which is based on pqWaitTimed and thence pqSocketCheck and thence
pqSocketPoll.

Noticing that bowerbird builds with openssl, I'm a bit suspicious that
the issue might have something to do with the fact that pqSocketCheck
has a special case for SSL, which I don't think WaitEventSet knows about.
But I don't think SSL is actually active in the buildfarm run, so it's
hard to see how we get to that scenario exactly --- or even less why it
would only matter on Windows.
        regards, tom lane




On 03/08/2017 08:29 PM, Andrew Dunstan wrote:
>
> On 03/08/2017 08:33 AM, Peter Eisentraut wrote:
>> On 3/6/17 09:38, Peter Eisentraut wrote:
>>> On 3/4/17 01:45, Petr Jelinek wrote:
>>>> If that's the case, the attached should fix it, but I have no way of
>>>> testing it on windows, I can only say that it still works on my machine
>>>> so at least it hopefully does not make things worse.
>>> Committed that.  Let's see how it goes.
>> So that didn't work.  Now we probably need someone to dig into that host
>> directly.
>>
>> Andrew, could you help?
>>
>
> I'll take a look when I get a chance. Might be a few days.
>



I have confirmed on jacana Petr's observation that adding a timeout to
the WaitLatchOrSocket cures the problem.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> I have confirmed on jacana Petr's observation that adding a timeout to
> the WaitLatchOrSocket cures the problem.

Does anyone have a theory as to why that cures the problem?

What length of timeout is being suggested here?  Would a long timeout
perhaps create a performance issue?  That is, I'm wondering if the
wait actually sleeps to the end of the timeout in the problem case(s).
        regards, tom lane



On 17/03/17 16:07, Tom Lane wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> I have confirmed on jacana Petr's observation that adding a timeout to
>> the WaitLatchOrSocket cures the problem.
> 
> Does anyone have a theory as to why that cures the problem?
> 

I now have theory and even PoC patch for it.

The long wait of WaitLatchOrSocket happens after connection state
switches from CONNECTION_STARTED to CONNECTION_MADE in both cases the
return value of PQconnectPoll is PGRES_POLLING_WRITING so we wait for
FD_WRITE.

Now the documentation for WSAEventSelect says "The FD_WRITE network
event is handled slightly differently. An FD_WRITE network event is
recorded when a socket is first connected with a call to the connect,
ConnectEx, WSAConnect, WSAConnectByList, or WSAConnectByName function or
when a socket is accepted with accept, AcceptEx, or WSAAccept function
and then after a send fails with WSAEWOULDBLOCK and buffer space becomes
available. Therefore, an application can assume that sends are possible
starting from the first FD_WRITE network event setting and lasting until
a send returns WSAEWOULDBLOCK. After such a failure the application will
find out that sends are again possible when an FD_WRITE network event is
recorded and the associated event object is set."

But while PQconnectPoll does connect() before setting connection status
to CONNECTION_STARTED and returns  PGRES_POLLING_WRITING so the FD_WRITE
eventually happens, it does not do any writes in the code block that
switches to  CONNECTION_MADE and PGRES_POLLING_WRITING. That means
FD_WRITE event is never recorded as per quoted documentation.

Then what remains is why it works in libpq. If you look at
pgwin32_select which is eventually called by libpq code, it actually
handles the situation by trying empty WSASend on any socket that is
supposed to wait for FD_WRITE and only calling the
WaitForMultipleObjectsEx when WSASend failed with WSAEWOULDBLOCK, when
the WSASend succeeds it immediately returns ok.

So I did something similar in attached for WaitEventSetWaitBlock() and
it indeed solves the issue my windows test machine. Now the
coding/placement probably isn't the best (and there are no comments),
but maybe somebody will find proper place for this now that we know the
cause.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
> Now the documentation for WSAEventSelect says "The FD_WRITE network
> event is handled slightly differently. An FD_WRITE network event is
> recorded when a socket is first connected with a call to the connect,
> ConnectEx, WSAConnect, WSAConnectByList, or WSAConnectByName function or
> when a socket is accepted with accept, AcceptEx, or WSAAccept function
> and then after a send fails with WSAEWOULDBLOCK and buffer space becomes
> available. Therefore, an application can assume that sends are possible
> starting from the first FD_WRITE network event setting and lasting until
> a send returns WSAEWOULDBLOCK. After such a failure the application will
> find out that sends are again possible when an FD_WRITE network event is
> recorded and the associated event object is set."

> But while PQconnectPoll does connect() before setting connection status
> to CONNECTION_STARTED and returns  PGRES_POLLING_WRITING so the FD_WRITE
> eventually happens, it does not do any writes in the code block that
> switches to  CONNECTION_MADE and PGRES_POLLING_WRITING. That means
> FD_WRITE event is never recorded as per quoted documentation.

Ah-hah!  Great detective work.

> Then what remains is why it works in libpq. If you look at
> pgwin32_select which is eventually called by libpq code, it actually
> handles the situation by trying empty WSASend on any socket that is
> supposed to wait for FD_WRITE and only calling the
> WaitForMultipleObjectsEx when WSASend failed with WSAEWOULDBLOCK, when
> the WSASend succeeds it immediately returns ok.

> So I did something similar in attached for WaitEventSetWaitBlock() and
> it indeed solves the issue my windows test machine. Now the
> coding/placement probably isn't the best (and there are no comments),
> but maybe somebody will find proper place for this now that we know the
> cause.

Yeah, I'm afraid we had better do something more or less like this.
It's interesting to speculate about whether WaitEventSet could keep
additional state that would avoid the need for a dummy send() every
time, but that seems like material for new development not a bug fix.

In any case, a quick review of the code suggests that there are no
performance-critical places where we wait for WL_SOCKET_WRITEABLE
unless we've already detected that we have to wait for the network;
so the dummy send() isn't going to do anything except consume a
few more CPU cycles before we get to the point of blocking.  It may
not be worth worrying about.

I'll add some comments and push this.  Does everyone agree that
this had better get back-patched, too?
        regards, tom lane



On 17/03/17 17:28, Tom Lane wrote:
> Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
>> Now the documentation for WSAEventSelect says "The FD_WRITE network
>> event is handled slightly differently. An FD_WRITE network event is
>> recorded when a socket is first connected with a call to the connect,
>> ConnectEx, WSAConnect, WSAConnectByList, or WSAConnectByName function or
>> when a socket is accepted with accept, AcceptEx, or WSAAccept function
>> and then after a send fails with WSAEWOULDBLOCK and buffer space becomes
>> available. Therefore, an application can assume that sends are possible
>> starting from the first FD_WRITE network event setting and lasting until
>> a send returns WSAEWOULDBLOCK. After such a failure the application will
>> find out that sends are again possible when an FD_WRITE network event is
>> recorded and the associated event object is set."
> 
>> But while PQconnectPoll does connect() before setting connection status
>> to CONNECTION_STARTED and returns  PGRES_POLLING_WRITING so the FD_WRITE
>> eventually happens, it does not do any writes in the code block that
>> switches to  CONNECTION_MADE and PGRES_POLLING_WRITING. That means
>> FD_WRITE event is never recorded as per quoted documentation.
> 
> Ah-hah!  Great detective work.
> 
>> Then what remains is why it works in libpq. If you look at
>> pgwin32_select which is eventually called by libpq code, it actually
>> handles the situation by trying empty WSASend on any socket that is
>> supposed to wait for FD_WRITE and only calling the
>> WaitForMultipleObjectsEx when WSASend failed with WSAEWOULDBLOCK, when
>> the WSASend succeeds it immediately returns ok.
> 
>> So I did something similar in attached for WaitEventSetWaitBlock() and
>> it indeed solves the issue my windows test machine. Now the
>> coding/placement probably isn't the best (and there are no comments),
>> but maybe somebody will find proper place for this now that we know the
>> cause.
> 
> Yeah, I'm afraid we had better do something more or less like this.
> It's interesting to speculate about whether WaitEventSet could keep
> additional state that would avoid the need for a dummy send() every
> time, but that seems like material for new development not a bug fix.
> 
> In any case, a quick review of the code suggests that there are no
> performance-critical places where we wait for WL_SOCKET_WRITEABLE
> unless we've already detected that we have to wait for the network;
> so the dummy send() isn't going to do anything except consume a
> few more CPU cycles before we get to the point of blocking.  It may
> not be worth worrying about.
> 

Thanks, now that I look at it again I think we might need to do cycle
with the occurred_events and returned_events and not return on first
find since theoretically there can be multiple sockets if this gets
called directly and not via WaitLatchOrSocket(). I don't have mental
power to finish it up right now though, sorry for that.

> I'll add some comments and push this.  Does everyone agree that
> this had better get back-patched, too?
> 

+1

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
> On 17/03/17 17:28, Tom Lane wrote:
>> Yeah, I'm afraid we had better do something more or less like this.
>> It's interesting to speculate about whether WaitEventSet could keep
>> additional state that would avoid the need for a dummy send() every
>> time, but that seems like material for new development not a bug fix.

> Thanks, now that I look at it again I think we might need to do cycle
> with the occurred_events and returned_events and not return on first
> find since theoretically there can be multiple sockets if this gets
> called directly and not via WaitLatchOrSocket(). I don't have mental
> power to finish it up right now though, sorry for that.

I think WaitEventSet is only required to identify at least one reason
for ending the wait; it is not required to identify all of them.
(Even if it tried to do so, the information could be out of date by
the time it returns.)  So I'm not particularly fussed about that,
even if we had code that waited on write-ready for multiple sockets
which I don't believe we do.
        regards, tom lane




On 03/17/2017 12:28 PM, Tom Lane wrote:
> Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
>> Now the documentation for WSAEventSelect says "The FD_WRITE network
>> event is handled slightly differently. An FD_WRITE network event is
>> recorded when a socket is first connected with a call to the connect,
>> ConnectEx, WSAConnect, WSAConnectByList, or WSAConnectByName function or
>> when a socket is accepted with accept, AcceptEx, or WSAAccept function
>> and then after a send fails with WSAEWOULDBLOCK and buffer space becomes
>> available. Therefore, an application can assume that sends are possible
>> starting from the first FD_WRITE network event setting and lasting until
>> a send returns WSAEWOULDBLOCK. After such a failure the application will
>> find out that sends are again possible when an FD_WRITE network event is
>> recorded and the associated event object is set."
>> But while PQconnectPoll does connect() before setting connection status
>> to CONNECTION_STARTED and returns  PGRES_POLLING_WRITING so the FD_WRITE
>> eventually happens, it does not do any writes in the code block that
>> switches to  CONNECTION_MADE and PGRES_POLLING_WRITING. That means
>> FD_WRITE event is never recorded as per quoted documentation.
> Ah-hah!  Great detective work.
>
>> Then what remains is why it works in libpq. If you look at
>> pgwin32_select which is eventually called by libpq code, it actually
>> handles the situation by trying empty WSASend on any socket that is
>> supposed to wait for FD_WRITE and only calling the
>> WaitForMultipleObjectsEx when WSASend failed with WSAEWOULDBLOCK, when
>> the WSASend succeeds it immediately returns ok.
>> So I did something similar in attached for WaitEventSetWaitBlock() and
>> it indeed solves the issue my windows test machine. Now the
>> coding/placement probably isn't the best (and there are no comments),
>> but maybe somebody will find proper place for this now that we know the
>> cause.
> Yeah, I'm afraid we had better do something more or less like this.
> It's interesting to speculate about whether WaitEventSet could keep
> additional state that would avoid the need for a dummy send() every
> time, but that seems like material for new development not a bug fix.
>
> In any case, a quick review of the code suggests that there are no
> performance-critical places where we wait for WL_SOCKET_WRITEABLE
> unless we've already detected that we have to wait for the network;
> so the dummy send() isn't going to do anything except consume a
> few more CPU cycles before we get to the point of blocking.  It may
> not be worth worrying about.
>
> I'll add some comments and push this.  Does everyone agree that
> this had better get back-patched, too?
>
>             



Confirmed that this fixes the problem on jacana.

+1 for backpatching.

Does that mean we could remove the special logic in pgstat.c?

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 03/17/2017 12:28 PM, Tom Lane wrote:
>> I'll add some comments and push this.  Does everyone agree that
>> this had better get back-patched, too?

> Confirmed that this fixes the problem on jacana.
> +1 for backpatching.

I've pushed this into 9.6, but the WaitEventSet code doesn't exist before
that.  It might be that we should apply a similar fix to the predecessor
code, but I'm not really excited about it given that we have no indication
that the bug can be hit in the older branches.  Anyway, lacking a Windows
machine to test on, I have no interest in trying to make such a patch
blindly.

> Does that mean we could remove the special logic in pgstat.c?

If you mean the bit near line 3930 about
        * Windows, at least in its Windows Server 2003 R2 incarnation,        * sometimes loses FD_READ events.  Waking
upand retrying the recv()        * fixes that, so don't sleep indefinitely.  This is a crock of the        * first
water,but until somebody wants to debug exactly what's        * happening there, this is the best we can do.
 

that doesn't seem related --- that's about missing FD_READ events not
FD_WRITE.
        regards, tom lane



I wrote:
> Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
>> Thanks, now that I look at it again I think we might need to do cycle
>> with the occurred_events and returned_events and not return on first
>> find since theoretically there can be multiple sockets if this gets
>> called directly and not via WaitLatchOrSocket(). I don't have mental
>> power to finish it up right now though, sorry for that.

> I think WaitEventSet is only required to identify at least one reason
> for ending the wait; it is not required to identify all of them.

Also, I see the header comment for the Windows version of
WaitEventSetWaitBlock says it really only ever reports one event anyway.
So there seems no need to work harder than this.  Pushed with cosmetic
adjustments.
        regards, tom lane