Re: pg_listener entries deleted under heavy NOTIFY load only on Windows - Mailing list pgsql-bugs

From Marshall, Steve
Subject Re: pg_listener entries deleted under heavy NOTIFY load only on Windows
Date
Msg-id 49889E5D.8020306@wsi.com
Whole thread Raw
In response to Re: pg_listener entries deleted under heavy NOTIFY load only on Windows  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_listener entries deleted under heavy NOTIFY load only on Windows  ("Marshall, Steve" <smarshall@wsi.com>)
List pgsql-bugs
I've completed my test for changes to the Windows implementation of
pgkill.  I ran two clients constantly doing a pair of notifications,
with one client subscribing to those events.  I ran the test for 90
hours.  I found 760 cases where the CallNamedPipe function initially
failed, but succeeded after recalling.  In all but one of those cases,
only a single retry was needed; the other case required two retries.

As Tom has pointed out, there is no principled way to determine the
correct number of times to retry.  Therefore, I have set the maximum
number of retries to 2, which is the worst case found in a fairly long
stress test.  This strikes me as a reasonable balance between failing
quickly in truly erroneous cases, while still avoiding transient
pseudo-failures.

A patch with this change is attached.   It should be applied to the
8.3.5 version of src/port/kill.c

Steve


*** kill.c    2009-02-03 14:28:21.753474644 -0500
--- kill.c.mod    2009-02-03 14:28:04.465829331 -0500
***************
*** 25,30 ****
--- 25,33 ----
      BYTE        sigData = sig;
      BYTE        sigRet = 0;
      DWORD        bytes;
+     DWORD        timeout = 1000;     /* in ms; forever = NMPWAIT_WAIT_FOREVER */
+     const int    max_pipe_tries = 3;
+     int            pipe_tries = 0;

      /* we allow signal 0 here, but it will be ignored in pg_queue_signal */
      if (sig >= PG_SIGNAL_COUNT || sig < 0)
***************
*** 39,45 ****
          return -1;
      }
      snprintf(pipename, sizeof(pipename), "\\\\.\\pipe\\pgsignal_%u", pid);
!     if (!CallNamedPipe(pipename, &sigData, 1, &sigRet, 1, &bytes, 1000))
      {
          if (GetLastError() == ERROR_FILE_NOT_FOUND)
              errno = ESRCH;
--- 42,63 ----
          return -1;
      }
      snprintf(pipename, sizeof(pipename), "\\\\.\\pipe\\pgsignal_%u", pid);
!
!     /*
!      *  Writing data to the named pipe can fail for transient reasons.
!      *  Therefore, it is useful to retry if it fails.  The maximum number of
!      *  calls to make was empirically determined from a 90-hour notification
!      *  stress test.
!      */
!     while (pipe_tries < max_pipe_tries &&
!             !CallNamedPipe(pipename, &sigData, 1, &sigRet, 1, &bytes, timeout))
!     {
!         pipe_tries++;
!         timeout = 250;
!         pg_usleep(10000);
!     }
!
!     if (pipe_tries >= max_pipe_tries)
      {
          if (GetLastError() == ERROR_FILE_NOT_FOUND)
              errno = ESRCH;

pgsql-bugs by date:

Previous
From: Joshua Tolley
Date:
Subject: Re: Behavior of identically-named savepoints
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] BUG #4516: FOUND variable does not work after RETURN QUERY