Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram - Mailing list pgsql-bugs
From | Luke Koops |
---|---|
Subject | Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram |
Date | |
Msg-id | A3144629B5AC714A8BF27806EBFA70575146229F@sottexch7.corp.ad.entrust.com Whole thread Raw |
In response to | Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram (Nikhil Sontakke <nikhil.sontakke@enterprisedb.com>) |
Responses |
Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx
while attempting to recv a datagram
|
List | pgsql-bugs |
There was no firewall in place, or more correctly the Windows Firewall is c= onfigured to be off. There is no other firewall installed on the system. To get to this point in the code, the return value from WSARecv() was WSAEW= OULDBLOCK. The socket is set for overlapped IO and is a datagram socket. = MSDN documentation says that means there are too many outstanding overlappe= d IO requests. I don't know if "too many" applies to just this socket or t= o the system as a whole. The documentation isn't clear about how to handle= the return code in this situation. We don't need to know if this is a Kernel issue, a bug in winsock, or an un= documented behaviour. Regardless, it can be treated as a fault. Knowing that it is possible for WaitForMultipleObjectsEx to lock up means t= hat it is not safe to call with an INFINITE timeout. The workaround that's= being discussed is beginning to look like the one at line 172 of socket.c.= It's bad enough that there is a WSASend in pgwin32_waitforsinglesocket().= I doubt you also want to add a WSARecv. There should be a cleaner way to= handle both of these situations. I am planning to eventually kill the stats collector and see if that clears= up the hanging issue, but I want to keep the system state in place for a b= it longer in case there is some other diagnostic steps I should try. I've = exhausted everything I could think of. -Luke -----Original Message----- From: Nikhil Sontakke [mailto:nikhil.sontakke@enterprisedb.com] Sent: Monday, August 03, 2009 10:38 AM To: Magnus Hagander Cc: Alvaro Herrera; Luke Koops; pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #4958: Stats collector hung on WaitForMultipleObjec= tsEx while attempting to recv a datagram Hi, >>>>> >>>>> Maybe. I'm unsure if it's enough to just try another >>>>> WaitForSingleObjectEx() on it, or if we need to actually issue a >>>>> WSARecv() on it as well. Maybe it would be enough to just change >>>>> the INIFINTE on line 318 of socket.c to a fixed value. That will >>>>> loop down to WSARecv() which should exit with WSAEWOULDBLOCK which >>>>> will cause us to do a short sleep and come back. But we'd have to >>>>> change the limit of 5 somehow then, since in theory we should wait >>>>> forever. Maybe that outer loop should just be a for(;;), what do you = think? >>>>> >>>> >>>> Yes, line 318 seems to be a much better location to me. If Windows >>>> and it's socket logic behaves properly most of the times :), most >>>> of the calls should come out within the first few tries, so >>>> changing 5 to an infinite loop shouldn't hurt those normal use cases i= n theory. >>>> >>>> OTOH, I was wondering what if we kill the stats collector and on a >>>> restart the socket communication resumes properly. Would that >>>> conclusively mean that it is a flaw in our code? >>> >>> No, if we kill the stats collector that will destroy all sockets, >>> and when the new one starts all the sockets it operates on are fresh >>> and new. So it doesn't show that the flaw is in our code - but it >>> also doesn't show that it's in the kernel or runtime libraries. >>> >> >> AFAICS in the code, the inherited pgStatSock socket FD remains the >> same across the restart of the stats collector process... > > Partially correct, I think. > > Each backend has it's own handle on win32, since we use EXEC_BACKEND > (this includes the "utility processes" like the stats collector). When > we start the new one, we are going to use DuplicateHandle() in > save_backend_variables(). This will therefor get it a new handle, but > they are both pointing to the same kernel object. I don't know if > WaitForMultipleObjectsEx() is going to see these as two different > objects or not, but I think it does. > Hmm, got it. Nothing like adding more confusion into the mix :) Regards, Nikhils -- http://www.enterprisedb.com
pgsql-bugs by date: