Hang on NOTIFY - Mailing list pgsql-bugs

From Mark Simonetti
Subject Hang on NOTIFY
Date
Msg-id 55C49756.70505@opalsoftware.co.uk
Whole thread Raw
Responses Re: Hang on NOTIFY  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The system I am developing makes extensive use of the async
NOTIFY/LISTEN system.

I am currently experiencing a problem on 2 production servers:

Server 1:
Virtual Windows Server 2008 R2 (VMWare)
PostgreSQL 9.3.5

Server 2:
Virtual Windows Server 2008 R2 (VMWare)
PostgreSQL 9.4.2

After the system has been running for a period of time, sometimes a few
days sometimes a few weeks, any calls to NOTIFY
will hang.

After in depth investigation it appears to happen when a listening
backend has been connected for some time (days).

Any other backend trying to inform that backend will hang on
"CallNamedPipe" in pgkill (kill.c).

Here is a stack trace from the hung SENDING backend, main thread : -

      ntdll.dll!_NtFsControlFile@40()  + 0x15 bytes
      ntdll.dll!_NtFsControlFile@40()  + 0x15 bytes
      kernel32.dll!_CallNamedPipeW@28()  + 0xf4 bytes
      postgres.exe!pgkill(int pid, int sig)  Line 43 + 0x2b bytes  C
      postgres.exe!SendProcSignal(int pid, ProcSignalReason reason, int
backendId)  Line 198 + 0x10 bytes    C
      postgres.exe!SignalBackends()  Line 1497 + 0xe bytes    C
 >    postgres.exe!ProcessCompletedNotifies()  Line 1092    C
      postgres.exe!PostgresMain(int argc, char * * argv, const char *
dbname, const char * username)  Line 3947    C
      postgres.exe!BackendRun(Port * port)  Line 4011 + 0x21 bytes  C
      postgres.exe!SubPostmasterMain(int argc, char * * argv)  Line 4515
+ 0x8 bytes    C
      postgres.exe!main(int argc, char * * argv)  Line 203 + 0x7 bytes    C
      postgres.exe!__tmainCRTStartup()  Line 555 + 0x17 bytes    C
      kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes
      ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes
      ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes

Here is a stack trace from the signalling thread (I know its irrelevent
as this is for incomming signals) : -

      ntdll.dll!_NtFsControlFile@40()  + 0x15 bytes
      ntdll.dll!_NtFsControlFile@40()  + 0x15 bytes
 >    postgres.exe!pg_signal_thread(void * param)  Line 279 + 0x9 bytes    C


Now for the RECIPIENT backend : -

      ntdll.dll!_ZwWaitForMultipleObjects@20()  + 0x15 bytes
      ntdll.dll!_ZwWaitForMultipleObjects@20()  + 0x15 bytes
      KERNELBASE.dll!_WaitForMultipleObjectsEx@20()  + 0x36 bytes
      kernel32.dll!_WaitForMultipleObjectsExImplementation@20()  + 0x8e
bytes
 >     postgres.exe!pgwin32_waitforsinglesocket(unsigned int s, int
what, int timeout)  Line 216 + 0x14 bytes    C
      postgres.exe!pgwin32_recv(unsigned int s, char * buf, int len, int
f)  Line 352 + 0xa bytes    C
      postgres.exe!secure_read(Port * port, void * ptr, unsigned int
len)  Line 304 + 0x12 bytes    C
      postgres.exe!pq_getbyte()  Line 895 + 0x67 bytes    C
      postgres.exe!SocketBackend(StringInfoData * inBuf)  Line 344 + 0x5
bytes    C
     postgres.exe!PostgresMain(int argc, char * * argv, const char *
dbname, const char * username)  Line 3968 + 0x1c bytes    C
      postgres.exe!BackendRun(Port * port)  Line 4011 + 0x21 bytes  C
      postgres.exe!SubPostmasterMain(int argc, char * * argv)  Line 4515
+ 0x8 bytes    C
      postgres.exe!main(int argc, char * * argv)  Line 203 + 0x7 bytes    C
      postgres.exe!__tmainCRTStartup()  Line 555 + 0x17 bytes    C
      kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes
      ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes
      ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes

This is the usual place for it to wait, so this seems okay.

      ntdll.dll!_NtFsControlFile@40()  + 0x15 bytes
      ntdll.dll!_NtFsControlFile@40()  + 0x15 bytes
 >    postgres.exe!pg_signal_thread(void * param)  Line 279 + 0x9 bytes    C

Also looks fine.

This seems like a possible Windows bug, as the call to CallNamedPipe has
a timeout of 1000 milliseconds, but it is clearly not timing out.  It
only seems to exit if I exit the backend it is trying to signal.

NOTE: it is trying to send to many backends, but on all the stuck
backends I checked, they all were stuck sending to the same recipient.
Closing that particular recipient DOES free everything up and signals
start flowing again.

I've searched around and cannot find a similar bug report.  Is it
possibly something I'm doing wrong?

Thanks,
Mark.
--

pgsql-bugs by date:

Previous
From: beijing_pg@163.com
Date:
Subject: BUG #13541: There is a visibility issue when run some DDL and Query. The time window is very shot
Next
From: Tom Lane
Date:
Subject: Re: Hang on NOTIFY