Re: Hang on NOTIFY - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Hang on NOTIFY
Date
Msg-id 9157.1438976418@sss.pgh.pa.us
Whole thread Raw
In response to Hang on NOTIFY  (Mark Simonetti <marks@opalsoftware.co.uk>)
List pgsql-bugs
Mark Simonetti <marks@opalsoftware.co.uk> writes:
> [ Our Windows kill() emulation sometimes hangs ]

For the record, I was helping Mark track this down off-list, up till the
point where it was unmistakably a Windows-specific issue.  I can't do much
more with it now.

I see no good reason to think that the problem is especially
NOTIFY-specific; it seems likely that any cross-backend signal attempt,
such as pg_cancel_backend(), would have the same hazard.  So if anyone
is motivated to try to reproduce it, you might be able to do so in less
time than it takes Mark's system to have a problem by doing a boatload
of pg_cancel_backend()'s.

I had originally guessed that it might be associated with the target
backend exiting during or just before the signal attempt; however, in
Mark's tests the target process is clearly still there.  Some other
theories that would be worth pursuing are (1) it's triggered by multiple
processes trying to signal the same target process concurrently, or
(2) it's got something to do with the target being very long-lived
and therefore having received a lot of signals in its lifetime.
(32-bit overflow anybody?  Mark, can you estimate how many signals
the target might have received before things go south?)

> This seems like a possible Windows bug, as the call to CallNamedPipe has
> a timeout of 1000 milliseconds, but it is clearly not timing out.  It
> only seems to exit if I exit the backend it is trying to signal.

I did some googling and found this old report:
http://www.postgresql.org/message-id/1262016302.3302.37.camel@arc-dev2.wsicorp.com

We did apply a variant of the patch recommended there, cf commits
04a4413c2 and f27a4696f, but it's worth reading in this connection anyway.
In particular I noted the suggestion that CallNamedPipe's timeout only
applies to the initial wait to obtain an instance of the pipe.  If that's
accurate, it'd suggest that we're hanging after that step within the
multiple steps of CallNamedPipe.

            regards, tom lane

pgsql-bugs by date:

Previous
From: Mark Simonetti
Date:
Subject: Hang on NOTIFY
Next
From: Magnus Hagander
Date:
Subject: Re: Segfault in pg_stat_activity