Thread: Hang in pldebugger after git commit : 98a64d0

Hang in pldebugger after git commit : 98a64d0

From
Ashutosh Sharma
Date:
<div dir="ltr">Hi All,<br /><br />I have noticed that any SQL query executed immediately after attaching<br />to a
particulardebugging target (pldebugger) either takes longer time<br />for execution or it hangs on Windows. I have
checkedit on<br />PostgreSQL-9.5 and PostgreSQL-9.6 versions and have found that the<br />issue only exists on 9.6
version.After doing 'git bisect' i found<br />that the following git commit in PostgreSQL-9.6 branch is the culprit.<br
/><br/>commit <b>98a64d0bd713cb89e61bef6432befc</b><br />4b7b5da59e<br />Author: Andres Freund <<a
href="mailto:andres@anarazel.de">andres@anarazel.de</a>><br/>Date:   Mon Mar 21 09:56:39 2016 +0100<br /><br />   
IntroduceWaitEventSet API.<br /><br />    Commit ac1d794 ("Make idle backends exit if the postmaster dies.")<br />   
introduceda regression on, at least, large linux systems. Constantly<br />    adding the same postmaster_alive_fds to
theOSs internal datastructures<br />    for implementing poll/select can cause significant contention; leading<br />   
toa performance regression of nearly 3x in one example.<br /><br />    This can be avoided by using e.g. linux' epoll,
whichavoids having to<br />    add/remove file descriptors to the wait datastructures at a high rate.<br />   
Unfortunatelythe current latch interface makes it hard to allocate any<br />    persistent per-backend resources.<br
/><br/>    .................<br /><br />Following are the steps to reproduce the issue:<br /><br />1) Download
pldebuggerfrom below url and copy it into contrib directory.<br /><br />git clone git://<a
href="http://git.postgresql.org/git/pldebugger.git">git.postgresql.org/git/pldebugger.git</a><br/><br />2) Start a new
backendsession (psql -d postgres)<br />3) Create a plpgsql function say func1();<br />4) Get the oid of the func1 and
enabledebugging of this using pldbgapi function as shown below<br />    <br />    select plpgsql_oid_debug(16487);<br
/><br/>5) execute function func1 : select func1();<br /><br />After executing above query we will get the message as
belowand<br />terminal will not respond as it will go in listen mode.<br />   NOTICE:  PLDBGBREAK:2<br /><br />6) Start
anotherbackend session.<br />7) Execute below query.<br />   SELECT * FROM pldbg_attach_to_port(2)<br />   NOTE: We
needto extract the port number from step 5 NOTICE message<br />after 'PLDBGBREAK:' string and use as input here.<br
/><br/>8) Execute any SQL query now and the problem starts. I have tried with below queries.<br /><br />SELECT 1;<br
/> OR<br/>SELECT pg_backend_pid();<br /> OR<br />SELECT   FROM pldbg_wait_for_breakpoint(1::INTEGER);<br /><br
/>....<br/><br /><br />Problem Analysis:<br />-------------------------<br />Allthough i am very new to Windows, i
trieddebugging the issue and<br />could find that Backend is not receiving the query executed after<br />"SELECT
pldbg_attach_to_port(2)"and is infinitely waiting on<br />"WaitEventSetWaitBlock()" at WaitForMultipleObjects() to read
the<br/>input command. Below is the backtrace for the same.<br /><br />postgres.exe!WaitEventSetWaitBlock(WaitEventSet
*set, int<br />cur_timeout, WaitEvent * occurred_events, int nevents)  Line 1384 +<br />0x2b bytes C<br /> 
postgres.exe!WaitEventSetWait(WaitEventSet* set, long timeout,<br />WaitEvent * occurred_events, int nevents)  Line 936
+0x18 bytes C<br />  postgres.exe!secure_read(Port * port, void * ptr, unsigned __int64<br />len)  Line 168 C<br /> 
postgres.exe!pq_recvbuf() Line 921 + 0x33 bytes C<br />  postgres.exe!pq_getbyte()  Line 963 + 0x5 bytes C<br /> 
postgres.exe!SocketBackend(StringInfoData* inBuf)  Line 334 + 0x5 bytes C<br /> 
postgres.exe!ReadCommand(StringInfoData* inBuf)  Line 507 + 0xa bytes C<br />  postgres.exe!PostgresMain(int argc, char
** argv, const char *<br />dbname, const char * username)  Line 4004 + 0xd bytes C<br />  postgres.exe!BackendRun(Port
*port)  Line 4259 C<br />  postgres.exe!SubPostmasterMain(int argc, char * * argv)  Line 4750 C<br /> 
postgres.exe!main(intargc, char * * argv)  Line 216 C<br />  postgres.exe!__tmainCRTStartup()  Line 555 + 0x19 bytes
C<br/>  postgres.exe!mainCRTStartup()  Line 371 C<br /><br />With Regards,<br /> Ashutosh Sharma<br /> EnterpriseDB: <a
href="http://www.enterprisedb.com"rel="noreferrer" target="_blank">http://www.enterprisedb.com</a><br /></div> 

Re: Hang in pldebugger after git commit : 98a64d0

From
Michael Paquier
Date:
On Wed, Dec 07, 2016 at 03:16:09PM +0530, Ashutosh Sharma wrote:
> Problem Analysis:
> -------------------------
> Allthough i am very new to Windows, i tried debugging the issue and
> could find that Backend is not receiving the query executed after
> "SELECT pldbg_attach_to_port(2)" and is infinitely waiting on
> "WaitEventSetWaitBlock()" at WaitForMultipleObjects() to read the
> input command. Below is the backtrace for the same.

This code was indeed in need of review.  It seems that you are giving
enough information to reproduce the problem. I'll try to dig into
it...
-- 
Michael



Re: Hang in pldebugger after git commit : 98a64d0

From
Ashutosh Sharma
Date:
Well, we are currently passing INFINITE timeout value to WaitForMultipleObjects() API which is hanging. I thought of changing this INIFINTE timeout interval to some finite value like say 1 sec and could not reproduce the issue. But, having said that i checked the older code where this issue does not exists and found here as well we are passing INFINTE timeout value to WaitForMultipleObjects(). So not sure if this passing INFINITE timeout is  really an issue. Attached is a small patch that has my changes.

On Thu, Dec 8, 2016 at 1:51 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Dec 07, 2016 at 03:16:09PM +0530, Ashutosh Sharma wrote:
> Problem Analysis:
> -------------------------
> Allthough i am very new to Windows, i tried debugging the issue and
> could find that Backend is not receiving the query executed after
> "SELECT pldbg_attach_to_port(2)" and is infinitely waiting on
> "WaitEventSetWaitBlock()" at WaitForMultipleObjects() to read the
> input command. Below is the backtrace for the same.

This code was indeed in need of review.  It seems that you are giving
enough information to reproduce the problem. I'll try to dig into
it...
--
Michael

Attachment

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Michael Paquier
Date:
On Fri, Dec 9, 2016 at 2:38 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> Well, we are currently passing INFINITE timeout value to
> WaitForMultipleObjects() API which is hanging. I thought of changing this
> INIFINTE timeout interval to some finite value like say 1 sec and could not
> reproduce the issue. But, having said that i checked the older code where
> this issue does not exists and found here as well we are passing INFINTE
> timeout value to WaitForMultipleObjects(). So not sure if this passing
> INFINITE timeout is  really an issue. Attached is a small patch that has my
> changes.

That's obviously incorrect to me. This should not wait for a timeout
if the caller does not want to. So you are likely breaking a bunch of
code by doing so, including many extensions on Windows. The
pre-WaitEventSet code uses that:
-   if (wakeEvents & WL_TIMEOUT)
-   {
-       INSTR_TIME_SET_CURRENT(start_time);
-       Assert(timeout >= 0 && timeout <= INT_MAX);
-       cur_timeout = timeout;
-   }
-   else
-       cur_timeout = INFINITE;
INFINITE maps to -1 by looking at the MS docs, and that's as well what
the new code does so the inconsistency is not there. And the new code
does not bother about setting INFINITE and just uses -1.

I have tried to compile pldebugger with MSVC but gave up at the end,
so I am falling back to some code review for the time being. Andres
mentioned me that this Windows code was in need of an extra lookup.
And from what I can see, the logic that we have before 98a64d0 is a
set of handles using WaitForMultipleObjects with a pre-allocated
position in the handle array. The new code made things more generic by
allocating the events depending on what the user has set up.
pgwin32_signal_event is correctly using the first handle, and other
events registered correctly adapt to this position.

I have spent some time reviewing the code, and I think that I have
spotted one problem. The event set that the code you are triggering is
waiting for is FeBeWaitSet, which gets initialized in
pq_init()@pqcomm.c. 3 events are being set there. This goes through
CreateWaitEventSet, that sets up pgwin32_signal_event as first handle.
So far so good. Then a bunch of events are added with
AddWaitEventToSet. Which is fine as well as the handling of handles
with WSA_INVALID_EVENT looks correct, because its value is NULL and
there is a static assertion to check things.

One place in the code has spotted my attention:
+#elif defined(WAIT_USE_WIN32)
+   set->handles = (HANDLE) data;
+   data += sizeof(HANDLE) * nevents;
+#endif
This should be actually (nevents + 1) to take into account
pgwin32_signal_event. That's harmless now but it would if new fields
are added to WaitEventSet in the future.

A second thing that I am noticing is that in the new code:
+        * Note: we assume that the kernel calls involved in latch management
+        * will provide adequate synchronization on machines with weak memory
+        * ordering, so that we cannot miss seeing is_set if a notification
+        * has already been queued.
+        */
+       if (set->latch && set->latch->is_set)
+       {
+           occurred_events->fd = PGINVALID_SOCKET;
+           occurred_events->pos = set->latch_pos;
+           occurred_events->user_data =
+               set->events[set->latch_pos].user_data;
+           occurred_events->events = WL_LATCH_SET;
+           occurred_events++;
+           returned_events++;
+
+           break;
+       }
This basically means that if the latch is set, we don't wait at all
and drop the ball. I am wondering: isn't that a problem even if
WL_LATCH_SET is *not* set? If I read this code correctly, even if
caller has not set WL_LATCH_SET and the latch is set, then the wait
will stop.

Still reviewing the code...
-- 
Michael



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Michael Paquier
Date:
On Fri, Dec 9, 2016 at 3:23 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> This basically means that if the latch is set, we don't wait at all
> and drop the ball. I am wondering: isn't that a problem even if
> WL_LATCH_SET is *not* set? If I read this code correctly, even if
> caller has not set WL_LATCH_SET and the latch is set, then the wait
> will stop.

Nah. I misread the code. set->latch is not NULL only if WL_LATCH_SET is enabled.
-- 
Michael



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Michael Paquier
Date:
On Fri, Dec 9, 2016 at 4:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Dec 9, 2016 at 3:23 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> This basically means that if the latch is set, we don't wait at all
>> and drop the ball. I am wondering: isn't that a problem even if
>> WL_LATCH_SET is *not* set? If I read this code correctly, even if
>> caller has not set WL_LATCH_SET and the latch is set, then the wait
>> will stop.
>
> Nah. I misread the code. set->latch is not NULL only if WL_LATCH_SET is enabled.

OK, I think that I have spotted an issue after additional read of the
code. When a WSA event is used for read/write activity on a socket,
the same WSA event gets reused again and again. That's fine for
performance reasons, but what I think is not fine is the fact that
this WSA event is *not* reset even once in-between when calling
WaitEventAdjustWin32() to adjust an event HANDLE to wait for, and the
FeBeWaitEvent is used a lot. That's one inconsistency with the old
code that always closed the WSA event after using it, unconditionally.
Now that we cache it I think that we had better put it in a clean
state every time we want to use it again.

With the pointer miscalculation I pointed out upthread it gives the
patch attached.

Ashutosh, could you try it and see if it improves things?
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Ashutosh Sharma
Date:
Hi Micheal,
 
Ashutosh, could you try it and see if it improves things?
-
Thanks for your patch. I would like to inform you that I didn't find any improvement with your patch.

With Regards,
Ashutosh Sharma

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Ashutosh Sharma
Date:
Hi All,

I have been working on this issue for last few days and it seems like
i have almost found the reason for this failure. My findings till date
are as follows.

Just to confirm if the problem is happening due to reusability of
WaitEventSet structure, I had replaced  a function call
ModifyWaitEvent() and WaitEventSetWait() which makes use of an already
existing  WaitEventSet to store event handles with WaitLatchOrSocket()
and it worked. Actually WaitLatchOrSocket() creates a new WaitEventSet
structure every time it has to wait for an event and is also being
used in the old code. This clearly shows that we do have some problem
just because we are reusing the same set of object handles throughput
a backend session. I am further investigating on this and would post
the final analysis along with the fix very  soon. Attached is the
patch that has the changes described
above. Any suggestions or inputs would be appreciated.

On Tue, Dec 13, 2016 at 9:34 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> Hi Micheal,
>
>>
>> Ashutosh, could you try it and see if it improves things?
>> -
>
> Thanks for your patch. I would like to inform you that I didn't find any
> improvement with your patch.
>
> With Regards,
> Ashutosh Sharma
> EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Andres Freund
Date:
On 2016-12-12 16:46:38 +0900, Michael Paquier wrote:
> OK, I think that I have spotted an issue after additional read of the
> code. When a WSA event is used for read/write activity on a socket,
> the same WSA event gets reused again and again. That's fine for
> performance reasons

It's actually also required to not loose events,
i.e. correctness. Windows events are edge, not level triggered. The
previous implementation wasn't correct.  So just resetting things ain't
ok.

- Andres



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Michael Paquier
Date:
On Wed, Dec 14, 2016 at 1:14 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> I have been working on this issue for last few days and it seems like
> i have almost found the reason for this failure. My findings till date
> are as follows.
>
> Just to confirm if the problem is happening due to reusability of
> WaitEventSet structure, I had replaced  a function call
> ModifyWaitEvent() and WaitEventSetWait() which makes use of an already
> existing  WaitEventSet to store event handles with WaitLatchOrSocket()
> and it worked. Actually WaitLatchOrSocket() creates a new WaitEventSet
> structure every time it has to wait for an event and is also being
> used in the old code. This clearly shows that we do have some problem
> just because we are reusing the same set of object handles throughput
> a backend session. I am further investigating on this and would post
> the final analysis along with the fix very  soon. Attached is the
> patch that has the changes described
> above. Any suggestions or inputs would be appreciated.

So it would mean that there is a mismatch between what should be used
for the wait event and what is actually used. One simple idea if you
want to compare both would be to generate some elog(LOG) entries or
whatever on the fields we are interesting on and see what is
mismatching. That may help in understanding what's wrong in the wait
events used.
-- 
Michael



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Amit Kapila
Date:
On Tue, Dec 13, 2016 at 9:49 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-12-12 16:46:38 +0900, Michael Paquier wrote:
>> OK, I think that I have spotted an issue after additional read of the
>> code. When a WSA event is used for read/write activity on a socket,
>> the same WSA event gets reused again and again. That's fine for
>> performance reasons
>
> It's actually also required to not loose events,
> i.e. correctness. Windows events are edge, not level triggered.
>

Are all Windows events edge triggered?  What I read from msdn [1], it
doesn't appear so.  I am quoting text from msdn page [1] which seems
to be saying the event FD_READ we use in this case is level triggered.
"For FD_READ, FD_OOB, and FD_ACCEPT network events, network event
recording and event object signaling are level-triggered."

> The
> previous implementation wasn't correct.  So just resetting things ain't
> ok.
>

Okay, but I think we need to re-enable the existing event handle for
required event (FD_READ) by using WSAEventSelect() to make it work
sanely.  We have tried something on those lines and it resolved the
problem.  Ashutosh will post a patch on those lines later today.  Let
us know if you have something else in mind.


[1] - https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Ashutosh Sharma
Date:
Hi,

> Okay, but I think we need to re-enable the existing event handle for
> required event (FD_READ) by using WSAEventSelect() to make it work
> sanely.  We have tried something on those lines and it resolved the
> problem.  Ashutosh will post a patch on those lines later today.  Let
> us know if you have something else in mind.

Attached is the patch based on Amit's explanation above. Please have a
look and let me know for any concerns. Thank you Micheal and Andres
for sharing your thoughts and Amit for your valuable inputs.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Michael Paquier
Date:
On Wed, Dec 14, 2016 at 4:34 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> Attached is the patch based on Amit's explanation above. Please have a
> look and let me know for any concerns. Thank you Micheal and Andres
> for sharing your thoughts and Amit for your valuable inputs.
    }
+    event = &set->events[0];
+    event->events &= ~(WL_SOCKET_READABLE);#ifndef WIN32

I bet that this patch breaks many things for any non-WIN32 platform.
What I think you want to do is modify the flag events associated to
the socket read/write event to be updated in WaitEventAdjustWin32(),
which gets called in ModifyWaitEvent(). By the way, position 0 refers
to a socket for FeBeWaitSet, but that's a mandatory thing when a list
of events is created with AddWaitEventToSet.
-- 
Michael



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Ashutosh Sharma
Date:
Hi Micheal,

> I bet that this patch breaks many things for any non-WIN32 platform.

It seems to me like you have already identified some issues when
testing. If yes, could please share it. I have tested my patch  on
CentOS-7 and Windows-7 machines and have found no issues. I ran all
the regression test suites including the test cases for pldebugger.

> What I think you want to do is modify the flag events associated to
> the socket read/write event to be updated in WaitEventAdjustWin32(),

Well, this does not work as the following if check does not allow the
FD_READ or FD_WRITE flags to be applied on the already existing socket
handle. We have already debugged and verified this.
      if (events == event->events &&               (!(event->events & WL_LATCH_SET) || set->latch == latch))
  return;
 

> which gets called in ModifyWaitEvent(). By the way, position 0 refers
> to a socket for FeBeWaitSet, but that's a mandatory thing when a list
> of events is created with AddWaitEventToSet.

Yes, 0th position refers to event for socket and we will have to pass
0 as an argument to get socket events from FeBeWaitSet. You may see
the usage of ModifyWaitEvent() in secure_read where we are passing the
hard coded value (0) to get the socket events from the FeBeWaitSet.



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Michael Paquier
Date:
On Wed, Dec 14, 2016 at 5:38 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> What I think you want to do is modify the flag events associated to
>> the socket read/write event to be updated in WaitEventAdjustWin32(),
>
> Well, this does not work as the following if check does not allow the
> FD_READ or FD_WRITE flags to be applied on the already existing socket
> handle. We have already debugged and verified this.
>
>        if (events == event->events &&
>                 (!(event->events & WL_LATCH_SET) || set->latch == latch))
>                 return;
>
>> which gets called in ModifyWaitEvent(). By the way, position 0 refers
>> to a socket for FeBeWaitSet, but that's a mandatory thing when a list
>> of events is created with AddWaitEventToSet.
>
> Yes, 0th position refers to event for socket and we will have to pass
> 0 as an argument to get socket events from FeBeWaitSet. You may see
> the usage of ModifyWaitEvent() in secure_read where we are passing the
> hard coded value (0) to get the socket events from the FeBeWaitSet.

I have just read the patch, and hardcoding the array position for a
socket event to 0 assumes that the socket event will be *always* the
first one in the list. but that cannot be true all the time, any code
that does not register the socket event as the first one in the list
will likely break.
-- 
Michael



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Ashutosh Sharma
Date:
Hi Micheal,,

> I have just read the patch, and hardcoding the array position for a
> socket event to 0 assumes that the socket event will be *always* the
> first one in the list. but that cannot be true all the time, any code
> that does not register the socket event as the first one in the list
> will likely break.

I think your point is very valid and even i had similar thought in my
mind when implementing it but as i mentioned upthread that's how it is
being used in the current code base. Please check a function call to
ModifyWaitEvent() in secure_read().



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Robert Haas
Date:
On Wed, Dec 14, 2016 at 2:34 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> Okay, but I think we need to re-enable the existing event handle for
>> required event (FD_READ) by using WSAEventSelect() to make it work
>> sanely.  We have tried something on those lines and it resolved the
>> problem.  Ashutosh will post a patch on those lines later today.  Let
>> us know if you have something else in mind.
>
> Attached is the patch based on Amit's explanation above. Please have a
> look and let me know for any concerns. Thank you Micheal and Andres
> for sharing your thoughts and Amit for your valuable inputs.

One immediate concern is that the patch has no comments.  Considering
this seems to be a pretty tricky issue, that seems like a problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Robert Haas
Date:
On Wed, Dec 14, 2016 at 5:35 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> I have just read the patch, and hardcoding the array position for a
>> socket event to 0 assumes that the socket event will be *always* the
>> first one in the list. but that cannot be true all the time, any code
>> that does not register the socket event as the first one in the list
>> will likely break.
>
> I think your point is very valid and even i had similar thought in my
> mind when implementing it but as i mentioned upthread that's how it is
> being used in the current code base. Please check a function call to
> ModifyWaitEvent() in secure_read().

That's kind of irrelevant. A WaitEventSet is a general-purpose
primitive, and it needs to work in all situations, current and future,
where a reasonable developer would expect it to work.  Sure, pq_init()
puts the socket in FeBeWaitSet at position 0.  However, in
WaitLatchOrSocket, the socket event, if required, is added after the
latch and postmaster-death events.  And new code written using
CreateWaitEventSet() / AddWaitEventToSet() in the future could put a
socket at any offset in the event array, or could add multiple
sockets.  So Michael is right: you can't just stick a hack into
WaitEventSetWait that assumes the socket is at offset 0 even if that
happens to fix the problem we are facing right at this moment.

(I am not sure whether Michael's proposed placement is correct, but
I'm almost 100% sure the placement in the submitted patch is not.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Michael Paquier
Date:
On Thu, Dec 15, 2016 at 1:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Dec 14, 2016 at 5:35 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>> I have just read the patch, and hardcoding the array position for a
>>> socket event to 0 assumes that the socket event will be *always* the
>>> first one in the list. but that cannot be true all the time, any code
>>> that does not register the socket event as the first one in the list
>>> will likely break.
>>
>> I think your point is very valid and even i had similar thought in my
>> mind when implementing it but as i mentioned upthread that's how it is
>> being used in the current code base. Please check a function call to
>> ModifyWaitEvent() in secure_read().
>
> That's kind of irrelevant. A WaitEventSet is a general-purpose
> primitive, and it needs to work in all situations, current and future,
> where a reasonable developer would expect it to work.  Sure, pq_init()
> puts the socket in FeBeWaitSet at position 0.  However, in
> WaitLatchOrSocket, the socket event, if required, is added after the
> latch and postmaster-death events.  And new code written using
> CreateWaitEventSet() / AddWaitEventToSet() in the future could put a
> socket at any offset in the event array, or could add multiple
> sockets.  So Michael is right: you can't just stick a hack into
> WaitEventSetWait that assumes the socket is at offset 0 even if that
> happens to fix the problem we are facing right at this moment.
>
> (I am not sure whether Michael's proposed placement is correct, but
> I'm almost 100% sure the placement in the submitted patch is not.)

What would seem like the good place is really WaitEventAdjustWin32()
that gets called in ModifyWaitEvent() by secure_read(). And
WaitEventAdjustWin32() is the place where WSAEventSelect() gets called
to adjust the flags FD_READ and FD_WRITE depending on the events
WL_SOCKET_READABLE or WL_SOCKET_WRITEABLE.

Amit, I have a question here. As far as I can read, secure_read() does
set WL_SOCKET_READABLE as an event to wait for, so FD_READ is
correctly being set by WSAEventSelect(). What the patch proposed is
doing is making sure that *only* WL_SOCKET_READABLE is set in the
event flags. Why is that necessary? I can buy that this addresses the
problem as this has been visibly tested, but I am unclear about why
that's the correct thing to do. The patch clearly needs comments to
clarify its position. Actually, I think that what is surprising in the
solution proposed is actually that the event flags are reset *after*
the while loop in WaitEventSetWait(). I could understand something
happening inside the loop if WSAEnumNetworkEvents() updates things on
the fly though...
-- 
Michael



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Amit Kapila
Date:
On Thu, Dec 15, 2016 at 7:48 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Dec 15, 2016 at 1:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Dec 14, 2016 at 5:35 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>>> I have just read the patch, and hardcoding the array position for a
>>>> socket event to 0 assumes that the socket event will be *always* the
>>>> first one in the list. but that cannot be true all the time, any code
>>>> that does not register the socket event as the first one in the list
>>>> will likely break.
>>>
>>> I think your point is very valid and even i had similar thought in my
>>> mind when implementing it but as i mentioned upthread that's how it is
>>> being used in the current code base. Please check a function call to
>>> ModifyWaitEvent() in secure_read().
>>
>> That's kind of irrelevant. A WaitEventSet is a general-purpose
>> primitive, and it needs to work in all situations, current and future,
>> where a reasonable developer would expect it to work.  Sure, pq_init()
>> puts the socket in FeBeWaitSet at position 0.  However, in
>> WaitLatchOrSocket, the socket event, if required, is added after the
>> latch and postmaster-death events.  And new code written using
>> CreateWaitEventSet() / AddWaitEventToSet() in the future could put a
>> socket at any offset in the event array, or could add multiple
>> sockets.  So Michael is right: you can't just stick a hack into
>> WaitEventSetWait that assumes the socket is at offset 0 even if that
>> happens to fix the problem we are facing right at this moment.
>>
>> (I am not sure whether Michael's proposed placement is correct, but
>> I'm almost 100% sure the placement in the submitted patch is not.)
>

Robert, you are right.  As submitted, the placement is incorrect for
the reasons mentioned by you.  The intention was to do it in
secure_read() using FeBeWaitSet where we are sure it is at 0th
position.  Also, I think this is primarily required for windows, so
what we could instead do is introduce a new API ResetWaitEvent() which
will reset socket events passed to it.  This API can have port
specific reset functions something similar to what we have for
ModifyWaitEvent and for now we can have only Win32 specific function
as we are not aware if it is required for any other port.  Then call
this from secure_read before "goto retry;"

> What would seem like the good place is really WaitEventAdjustWin32()
> that gets called in ModifyWaitEvent() by secure_read().
>

Actually, we can't use or do this change in ModifyWaitEvent(), because
it is mainly a Set call, it won't allow you to reset it mainly due to
below check in this API.  Here, what we need is reset.

ModifyWaitEvent()
{
..
if (events == event->events &&
(!(event->events & WL_LATCH_SET) || set->latch == latch))
return;
..
}

If you refer the code prior to 9.6, we always reset the socket at end
of WaitLatchOrSocket for windows and the same cleanup is missing in
new code which is causing this issue.

/* Clean up the event object we created for the socket */
if (sockevent != WSA_INVALID_EVENT)
{
WSAEventSelect(sock, NULL, 0);
WSACloseEvent(sockevent);
}

Upthread Andres mentioned that it isn't okay to reset the events as
was done previously and I believe he is right with the minor exception
that we need it for some of the events like FD_READ.  So that's why I
think it is better to fix the problem for FD_READ.

> And
> WaitEventAdjustWin32() is the place where WSAEventSelect() gets called
> to adjust the flags FD_READ and FD_WRITE depending on the events
> WL_SOCKET_READABLE or WL_SOCKET_WRITEABLE.
>
> Amit, I have a question here. As far as I can read, secure_read() does
> set WL_SOCKET_READABLE as an event to wait for, so FD_READ is
> correctly being set by WSAEventSelect(). What the patch proposed is
> doing is making sure that *only* WL_SOCKET_READABLE is set in the
> event flags. Why is that necessary?
>

Oh no, the intention of the patch is to just reset the
WL_SOCKET_READABLE event flag after WaitEventSetWait() in
secure_read(), so that next time it calls ModifyWaitEvent(), it could
call WSAEventSelect() to re-enable the event.

> I can buy that this addresses the
> problem as this has been visibly tested, but I am unclear about why
> that's the correct thing to do. The patch clearly needs comments to
> clarify its position. Actually, I think that what is surprising in the
> solution proposed is actually that the event flags are reset *after*
> the while loop in WaitEventSetWait().
>

As mentioned above, the current patch doesn't do it correctly.

> I could understand something
> happening inside the loop if WSAEnumNetworkEvents() updates things on
> the fly though...
>

No, nothing wrong in that, it is just that for some of the events
(probably the events that are level triggered) we need to reenable
them by using WSAEventSelect before reuse.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Robert Haas
Date:
On Tue, Dec 13, 2016 at 11:19 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-12-12 16:46:38 +0900, Michael Paquier wrote:
>> OK, I think that I have spotted an issue after additional read of the
>> code. When a WSA event is used for read/write activity on a socket,
>> the same WSA event gets reused again and again. That's fine for
>> performance reasons
>
> It's actually also required to not loose events,
> i.e. correctness. Windows events are edge, not level triggered.

Windows might not lose the event, but PostgreSQL can certainly lose
the event.  In the Windows implementation of WaitEventSetBlock, we'll
ignore WL_SOCKET_READABLE if WL_LATCH_SET also fires.  Now, since the
event is edge-triggered[1], it never fires again and we hang.  Oops.
Even if we rewrote WaitEventSetBlock to always process all of the
events, there's nothing in the latch API that requires the caller to
read from the socket the first time WL_SOCKET_READABLE shows up.  The
caller is perfectly entitled to write:

if (rv & WL_LATCH_SET)  // do stuff
else if (rv & WL_SOCKET_READABLE)  // do other stuff

I suppose we could make a coding rule that the caller must not rely on
WL_SOCKET_READABLE being level-triggered, but I think that's likely to
lead to a stream of bugs that only show up on Windows, because most
people are used to the Linux semantics and will naturally write code
like the above which isn't robust in the face of an edge-triggered
event -- just as you yourself did in WaitEventSetBlock.

> The
> previous implementation wasn't correct.  So just resetting things ain't
> ok.

How was it incorrect?  The previous implementation *worked* and the
new one doesn't, at least in the case complained of here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Andres Freund
Date:
Hi,

On 2016-12-12 16:46:38 +0900, Michael Paquier wrote:
> Ashutosh, could you try it and see if it improves things?

So what's the theory of why this triggers pldebugger hangs, but
apparently causes not many other problems? I tried to skim pldebugger
code, but it's state isn't very condusive to seing the problem :(

Greetings,

Andres Freund



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Amit Kapila
Date:
On Thu, Dec 15, 2016 at 10:04 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2016-12-12 16:46:38 +0900, Michael Paquier wrote:
>> Ashutosh, could you try it and see if it improves things?
>
> So what's the theory of why this triggers pldebugger hangs, but
> apparently causes not many other problems?
>

The theory is that with pldebugger latch event gets triggered before
FD_READ due to which it seems like the second event gets lost and
WaitForMultipleObjects() keeps on waiting indefinitely.  The probable
reason is that we fail to reset the event due to which we are seeing
this behavior. That seems to be required as per msdn as well, as
pointed by Robert upthread.

Find attached patch to implement the resetting of event only for the
required case.  This fixes the reported problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Robert Haas
Date:
On Thu, Dec 15, 2016 at 4:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Dec 15, 2016 at 10:04 AM, Andres Freund <andres@anarazel.de> wrote:
>> Hi,
>>
>> On 2016-12-12 16:46:38 +0900, Michael Paquier wrote:
>>> Ashutosh, could you try it and see if it improves things?
>>
>> So what's the theory of why this triggers pldebugger hangs, but
>> apparently causes not many other problems?
>>
>
> The theory is that with pldebugger latch event gets triggered before
> FD_READ due to which it seems like the second event gets lost and
> WaitForMultipleObjects() keeps on waiting indefinitely.  The probable
> reason is that we fail to reset the event due to which we are seeing
> this behavior. That seems to be required as per msdn as well, as
> pointed by Robert upthread.
>
> Find attached patch to implement the resetting of event only for the
> required case.  This fixes the reported problem.

After some study I don't think this is quite right yet.  The client
code is not obliged to call ModifyWaitEvent() before again calling
WaitEventSetWait().  I think it should be the responsibility of
WaitEventSetWaitBlock() to reset the event, if needed, before calling
WaitForMultipleObjects().  BTW, I suggest this rewritten comment:
           /*------            * FD_READ events are edge-triggered on Windows per            *
https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx           * but users of the latch
facilityare entitled to expect            * level-triggered behavior.  Indeed, this function itself            *
expectslevel-triggered behavior, since it doesn't guarantee            * that a read event will be returned if the
latchis set at the            * same time.  Even if it did, the caller might drop that event            * expecting it
toreoccur on next call.  So, we must force            * the event to be reset if this WaitEventSet is used again in
      * order to avoid an indefinite hang.            *------            */
 

The dashes keep pgindent from inserting a line break into the link.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Amit Kapila
Date:
On Fri, Dec 16, 2016 at 9:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 15, 2016 at 4:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Thu, Dec 15, 2016 at 10:04 AM, Andres Freund <andres@anarazel.de> wrote:
>>> Hi,
>>>
>>> On 2016-12-12 16:46:38 +0900, Michael Paquier wrote:
>>>> Ashutosh, could you try it and see if it improves things?
>>>
>>> So what's the theory of why this triggers pldebugger hangs, but
>>> apparently causes not many other problems?
>>>
>>
>> The theory is that with pldebugger latch event gets triggered before
>> FD_READ due to which it seems like the second event gets lost and
>> WaitForMultipleObjects() keeps on waiting indefinitely.  The probable
>> reason is that we fail to reset the event due to which we are seeing
>> this behavior. That seems to be required as per msdn as well, as
>> pointed by Robert upthread.
>>
>> Find attached patch to implement the resetting of event only for the
>> required case.  This fixes the reported problem.
>
> After some study I don't think this is quite right yet.  The client
> code is not obliged to call ModifyWaitEvent() before again calling
> WaitEventSetWait().
>

makes sense.

>  I think it should be the responsibility of
> WaitEventSetWaitBlock() to reset the event, if needed, before calling
> WaitForMultipleObjects().
>

If we want to change WaitEventSetWaitBlock then ideally we need to
change all other wait API's (WAIT_USE_SELECT,  WAIT_USE_POLL, etc.) as
well.  I don't see the need for it, can't we do it in
WaitEventSetWait() after processing all the returned events.  It can
be done by enumerating all wait events and reset the events for which
reset flag is true.  If we see code prior to 9.6, this event (socket
event) has been reset only after processing all returned events, not
after every event.

>  BTW, I suggest this rewritten comment:
>
>             /*------
>              * FD_READ events are edge-triggered on Windows per
>              * https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx
>

Isn't the statement in above doc. "For FD_READ, FD_OOB, and FD_ACCEPT
network events, network event recording and event object signaling are
level-triggered." says that FD_READ events are level-triggered which
seems to be contradictory with above comment?

>
> The dashes keep pgindent from inserting a line break into the link.
>

Thanks for the information.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Robert Haas
Date:
On Fri, Dec 16, 2016 at 10:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>  I think it should be the responsibility of
>> WaitEventSetWaitBlock() to reset the event, if needed, before calling
>> WaitForMultipleObjects().
>>
>
> If we want to change WaitEventSetWaitBlock then ideally we need to
> change all other wait API's (WAIT_USE_SELECT,  WAIT_USE_POLL, etc.) as
> well.

Why?  This is only broken on Windows.  It would be nicer not to touch
any of the un-broken implementations.

>>  BTW, I suggest this rewritten comment:
>>
>>             /*------
>>              * FD_READ events are edge-triggered on Windows per
>>              * https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx
>>
>
> Isn't the statement in above doc. "For FD_READ, FD_OOB, and FD_ACCEPT
> network events, network event recording and event object signaling are
> level-triggered." says that FD_READ events are level-triggered which
> seems to be contradictory with above comment?

Argh.  I see your point.  Maybe we'd better rephrase that.  The
document does say that, but the behavior they described is actually a
weird hybrid of level-triggered and edge-triggered.  We should
probably just avoid that terminology altogether and explain that
read-ready won't necessarily be re-signalled unless there is an
intervening recv().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Ashutosh Sharma
Date:
Hi,

>>  I think it should be the responsibility of
>> WaitEventSetWaitBlock() to reset the event, if needed, before calling
>> WaitForMultipleObjects().
>>
>
> If we want to change WaitEventSetWaitBlock then ideally we need to
> change all other wait API's (WAIT_USE_SELECT,  WAIT_USE_POLL, etc.) as
> well.  I don't see the need for it, can't we do it in
> WaitEventSetWait() after processing all the returned events.  It can
> be done by enumerating all wait events and reset the events for which
> reset flag is true.  If we see code prior to 9.6, this event (socket
> event) has been reset only after processing all returned events, not
> after every event.

I think we are trying the complicate the things, I would look into the
fix like this, Let us assume that WaitForMultipleObjects() says an
event occured on a LATCH then we would save the occured event and
reset the LATCH immediately inside WaitEventSetWaitBlock() similarly
if event occurs on SOCKET we should save the occured events and reset
it. Now, the only concern here is why can't we reset event on SOCKET
using ResetEvent() API instead of introducing a new variable. I think
that is just because we are trying to avoid the events for SOCKET from
being re-created again and again. So, for me i think Amit's fix is
absolutely fine and is restricted to Windows. Please do correct me if
my point is wrong. Thank you.

With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Amit Kapila
Date:
On Sat, Dec 17, 2016 at 9:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Dec 16, 2016 at 10:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>  I think it should be the responsibility of
>>> WaitEventSetWaitBlock() to reset the event, if needed, before calling
>>> WaitForMultipleObjects().
>>>
>>
>> If we want to change WaitEventSetWaitBlock then ideally we need to
>> change all other wait API's (WAIT_USE_SELECT,  WAIT_USE_POLL, etc.) as
>> well.
>
> Why?  This is only broken on Windows.  It would be nicer not to touch
> any of the un-broken implementations.
>

Yeah, but we are planning to add a generic flag in WaitEvent structure
which can be used for similar purpose.  However, as per your
suggestion, I have changed it only for Win32 port.  Also, I kept the
change in ModifyWaitEvent API as that seems to be okay considering
'reset' is a generic flag in WaitEvent structure.

>>>  BTW, I suggest this rewritten comment:
>>>
>>>             /*------
>>>              * FD_READ events are edge-triggered on Windows per
>>>              * https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx
>>>
>>
>> Isn't the statement in above doc. "For FD_READ, FD_OOB, and FD_ACCEPT
>> network events, network event recording and event object signaling are
>> level-triggered." says that FD_READ events are level-triggered which
>> seems to be contradictory with above comment?
>
> Argh.  I see your point.  Maybe we'd better rephrase that.  The
> document does say that, but the behavior they described is actually a
> weird hybrid of level-triggered and edge-triggered.  We should
> probably just avoid that terminology altogether and explain that
> read-ready won't necessarily be re-signalled unless there is an
> intervening recv().
>

I also think so.  I have modified your suggested comment in that way.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Amit Kapila
Date:
On Sat, Dec 17, 2016 at 3:53 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> Hi,
>
>>>  I think it should be the responsibility of
>>> WaitEventSetWaitBlock() to reset the event, if needed, before calling
>>> WaitForMultipleObjects().
>>>
>>
>> If we want to change WaitEventSetWaitBlock then ideally we need to
>> change all other wait API's (WAIT_USE_SELECT,  WAIT_USE_POLL, etc.) as
>> well.  I don't see the need for it, can't we do it in
>> WaitEventSetWait() after processing all the returned events.  It can
>> be done by enumerating all wait events and reset the events for which
>> reset flag is true.  If we see code prior to 9.6, this event (socket
>> event) has been reset only after processing all returned events, not
>> after every event.
>
> I think we are trying the complicate the things, I would look into the
> fix like this, Let us assume that WaitForMultipleObjects() says an
> event occured on a LATCH then we would save the occured event and
> reset the LATCH immediately inside WaitEventSetWaitBlock() similarly
> if event occurs on SOCKET we should save the occured events and reset
> it. Now, the only concern here is why can't we reset event on SOCKET
> using ResetEvent() API instead of introducing a new variable. I think
> that is just because we are trying to avoid the events for SOCKET from
> being re-created again and again. So, for me i think Amit's fix is
> absolutely fine and is restricted to Windows.
>

It is fine as per current usage of WaitEventSet API's, however,
Robert's point is valid that user is not obliged to call
ModifyWaitEvent before WaitEventSetWait.  Imagine a case where some
new user of this API is calling WaitEventSetWait repeatedly without
calling ModifyWaitEvent.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Ashutosh Sharma
Date:
> It is fine as per current usage of WaitEventSet API's, however,
> Robert's point is valid that user is not obliged to call
> ModifyWaitEvent before WaitEventSetWait.  Imagine a case where some
> new user of this API is calling WaitEventSetWait repeatedly without
> calling ModifyWaitEvent.

Oops! I never thought this scenario, yes there can be a case where an
external module like FDW may call WaitEventSetWait() and in such case
the fix provided in v1 patch may not work. So, yes we may have to
reset the event in WaitEventSetWaitBlock(). I am extremely extremely
sorry for a noise.



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Robert Haas
Date:
On Sat, Dec 17, 2016 at 5:46 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Yeah, but we are planning to add a generic flag in WaitEvent structure
> which can be used for similar purpose.  However, as per your
> suggestion, I have changed it only for Win32 port.  Also, I kept the
> change in ModifyWaitEvent API as that seems to be okay considering
> 'reset' is a generic flag in WaitEvent structure.

Well, we don't really need the change in ModifyWaitEvent if we have
the change in WaitEventSetWaitBlock, right?  I'd be inclined to ditch
the former and keep the latter.  Also, this doesn't look right:

+       for (cur_event = set->events;
+                cur_event < (set->events + set->nevents)
+                && returned_events < nevents;
+                cur_event++)
+       {
+               if (cur_event->reset)
+               {
+                       WaitEventAdjustWin32(set, cur_event);
+                       cur_event->reset = false;
+               }
+       }

There's no need to include returned_events < nevents in the loop
condition here because returned_events isn't changing.

I think I'd also guard the reset flag with #ifdef WIN32.  If it's not
properly supported elsewhere it's just a foot-gun, and there seems to
be no reason to write the code to properly support it elsewhere,
whatever that would mean.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Amit Kapila
Date:
On Tue, Dec 20, 2016 at 8:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Dec 17, 2016 at 5:46 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Yeah, but we are planning to add a generic flag in WaitEvent structure
>> which can be used for similar purpose.  However, as per your
>> suggestion, I have changed it only for Win32 port.  Also, I kept the
>> change in ModifyWaitEvent API as that seems to be okay considering
>> 'reset' is a generic flag in WaitEvent structure.
>
> Well, we don't really need the change in ModifyWaitEvent if we have
> the change in WaitEventSetWaitBlock, right?
>

Yes.

>  I'd be inclined to ditch
> the former and keep the latter.
>

Okay, I think you want to keep the change just for Win32 which seems
fair as we haven't noticed such problem on any other platform.

>  Also, this doesn't look right:
>
> +       for (cur_event = set->events;
> +                cur_event < (set->events + set->nevents)
> +                && returned_events < nevents;
> +                cur_event++)
> +       {
> +               if (cur_event->reset)
> +               {
> +                       WaitEventAdjustWin32(set, cur_event);
> +                       cur_event->reset = false;
> +               }
> +       }
>
> There's no need to include returned_events < nevents in the loop
> condition here because returned_events isn't changing.
>

Fixed.

> I think I'd also guard the reset flag with #ifdef WIN32.  If it's not
> properly supported elsewhere it's just a foot-gun, and there seems to
> be no reason to write the code to properly support it elsewhere,
> whatever that would mean.
>

Okay.


Ashutosh Sharma has helped to test that pldebugger issue is fixed with
attached version.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Robert Haas
Date:
On Tue, Dec 20, 2016 at 11:32 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Ashutosh Sharma has helped to test that pldebugger issue is fixed with
> attached version.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Amit Kapila
Date:
On Wed, Dec 21, 2016 at 9:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Dec 20, 2016 at 11:32 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Ashutosh Sharma has helped to test that pldebugger issue is fixed with
>> attached version.
>
> Committed.
>

Thanks!

> Patch by Amit Kapial.  Reported and tested by Ashutosh Sharma, who
> also provided some analysis.  Further analysis by Michael Paquier.

In commit note, my name is not spelled correctly, but I think there is
nothing we can do about it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Robert Haas
Date:
On Wed, Dec 21, 2016 at 9:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Dec 21, 2016 at 9:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Dec 20, 2016 at 11:32 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> Ashutosh Sharma has helped to test that pldebugger issue is fixed with
>>> attached version.
>>
>> Committed.
>>
>
> Thanks!
>
>> Patch by Amit Kapial.  Reported and tested by Ashutosh Sharma, who
>> also provided some analysis.  Further analysis by Michael Paquier.
>
> In commit note, my name is not spelled correctly, but I think there is
> nothing we can do about it.

Sorry, it's too late to fix now.  But you spelled me name wrong today
too, so maybe we're even.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

From
Andres Freund
Date:
On 2016-12-16 23:04:13 -0500, Robert Haas wrote:
> >>  BTW, I suggest this rewritten comment:
> >>
> >>             /*------
> >>              * FD_READ events are edge-triggered on Windows per
> >>              * https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx
> >>
> >
> > Isn't the statement in above doc. "For FD_READ, FD_OOB, and FD_ACCEPT
> > network events, network event recording and event object signaling are
> > level-triggered." says that FD_READ events are level-triggered which
> > seems to be contradictory with above comment?
> 
> Argh.  I see your point.  Maybe we'd better rephrase that.  The
> document does say that, but the behavior they described is actually a
> weird hybrid of level-triggered and edge-triggered.  We should
> probably just avoid that terminology altogether and explain that
> read-ready won't necessarily be re-signalled unless there is an
> intervening recv().

(just looked at some latch code, reminding me of this)

FWIW, I wonder if the issue here isn't whether WaitForMultipleObjects /
WSAEventSelect is level triggered, but that WSAEnumNetworkEvents()
pretty much explicitly is not. And it seems to have effects over
multiple handles for the same socket.

The relevant line from the docs is: "The WSAEnumNetworkEvents function
discovers occurrences of network events for the indicated socket, clear
internal network event records, and reset event objects (optional)."

Note that it clears the state from the socket, *not* just the handle.


That behaviour of WSAEnumNetworkEvents() also seems to explains why
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f7819baa618c528f60e266874051563ecfe08207
was necessary, and why all the weird workaround in win32/socket.c exist.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers