Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram - Mailing list pgsql-bugs

From Luke Koops
Subject Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram
Date
Msg-id A3144629B5AC714A8BF27806EBFA7057514622B8@sottexch7.corp.ad.entrust.com
Whole thread Raw
In response to Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram  (Nikhil Sontakke <nikhil.sontakke@enterprisedb.com>)
Responses Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-bugs
> Yeah it will be interesting to see if the collector starts
> functioning fine after the restart. That might hint that the
> kernel object representing the socket is maybe fine but would
> not prove conclusively that the issue is with PG code because
> the layer used by WaitForMultipleObjectsEx might have issues too.
This morning I planned to verify that stats collection was still not workin=
g before killing the stats collector and allowing it to restart.  I had a p=
sql session open from the previous day, but I closed it and tried to start =
a new session.  I log each session and wanted to start a new log.

Now, I am unable to start a new psql session.  I get this error on the clie=
nt side:
| psql: could not send startup packet: Connection reset by peer (0x00002746=
/10054)
|
and this error on the server side:
| 2009-08-06 10:48:59.542 EDT,LOG:  could not receive data from client: No =
connection could be made because the target machine actively refused it.
| 2009-08-06 10:48:59.542 EDT,LOG:  incomplete startup packet


I didn't find too much in the archives about this.  It's what happens if yo=
u just connect to 5432 (with telnet for example) and then drop the connecti=
on.

Occasionally (3-6% of the time), I get this on the client side:
| C:\postgres\bin>psql
| psql: server closed the connection unexpectedly
|         This probably means the server terminated abnormally
|         before or while processing the request.
and this on the server side:
| 2009-08-06 11:16:27.933 EDT,LOG:  could not receive data from client: No =
connection could be made because the target machine actively refused it.

When this sequence happens, I can see a backend postgres.exe process start =
up and then exit very quickly.  Note the absence of the "incomplete startup=
 package" message.

Could it be related to the stats collector problem?  The stats collector on=
 this system has been hung for over 6 weeks, so the timing of this problem =
is quite delayed.

I have windbg on this system along with the source and the symbols, so I co=
uld look for anything in the debugger.

-Luke

pgsql-bugs by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: BUG #4966: wrong password.....
Next
From: Alvaro Herrera
Date:
Subject: Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram