update i386 spinlock for hyperthreading - Mailing list pgsql-patches

From Manfred Spraul
Subject update i386 spinlock for hyperthreading
Date
Msg-id 3FECB5A8.50708@colorfullife.com
Whole thread Raw
Responses Re: update i386 spinlock for hyperthreading
List pgsql-patches
Hi,

Intel recommends to add a special pause instruction into spinlock busy
loops. It's necessary for hyperthreading - without it, the cpu can't
figure out that a logical thread does no useful work and incorrectly
awards lots of execution resources to that thread. Additionally, it's
supposed to reduce the time the cpu needs to recover from the
(mispredicted) branch after the spinlock was obtained.
The attached patch adds a new platform hook and implements it for i386.
The new instruction is backward compatible, thus no cpu detection is
necessary.
Additionally I've increased the number of loops from 100 to 1000 - a 3
GHz Pentium 4 might execute 100 loops faster than a single bus
transaction. I don't know if this change is appropriate for all
platforms, or if SPINS_PER_DELAY should be made platform specific.

Mark did a test run with his dbt-2 benchmark on a 4-way Xeon with HT
enabled, and the patch resulted in a 10% performance increase:
Before:
http://developer.osdl.org/markw/dbt2-pgsql/284/
After:
http://developer.osdl.org/markw/dbt2-pgsql/300/

--
    Manfred
Index: ./src/backend/storage/lmgr/s_lock.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/lmgr/s_lock.c,v
retrieving revision 1.22
diff -c -r1.22 s_lock.c
*** ./src/backend/storage/lmgr/s_lock.c    23 Dec 2003 22:15:07 -0000    1.22
--- ./src/backend/storage/lmgr/s_lock.c    26 Dec 2003 22:24:52 -0000
***************
*** 76,82 ****
       * The select() delays are measured in centiseconds (0.01 sec) because 10
       * msec is a common resolution limit at the OS level.
       */
! #define SPINS_PER_DELAY        100
  #define NUM_DELAYS            1000
  #define MIN_DELAY_CSEC        1
  #define MAX_DELAY_CSEC        100
--- 76,82 ----
       * The select() delays are measured in centiseconds (0.01 sec) because 10
       * msec is a common resolution limit at the OS level.
       */
! #define SPINS_PER_DELAY        1000
  #define NUM_DELAYS            1000
  #define MIN_DELAY_CSEC        1
  #define MAX_DELAY_CSEC        100
***************
*** 111,116 ****
--- 111,117 ----

              spins = 0;
          }
+         CPU_DELAY();
      }
  }

Index: ./src/include/storage/s_lock.h
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/include/storage/s_lock.h,v
retrieving revision 1.123
diff -c -r1.123 s_lock.h
*** ./src/include/storage/s_lock.h    23 Dec 2003 22:15:07 -0000    1.123
--- ./src/include/storage/s_lock.h    26 Dec 2003 22:24:52 -0000
***************
*** 52,57 ****
--- 52,66 ----
   *    in assembly language to execute a hardware atomic-test-and-set
   *    instruction.  Equivalent OS-supplied mutex routines could be used too.
   *
+  *    Additionally, a platform specific delay function can be defined:
+  *
+  *    void CPU_DELAY(void)
+  *        Notification that the cpu is executing a busy loop.
+  *
+  *     Some platforms need such an indication. One example are platforms
+  *     that implement SMT, i.e. multiple logical threads that share
+  *     execution resources in one physical cpu.
+  *
   *    If no system-specific TAS() is available (ie, HAVE_SPINLOCKS is not
   *    defined), then we fall back on an emulation that uses SysV semaphores
   *    (see spin.c).  This emulation will be MUCH MUCH slower than a proper TAS()
***************
*** 115,120 ****
--- 124,140 ----
      return (int) _res;
  }

+ #define HAS_CPU_DELAY
+
+ #define CPU_DELAY()    cpu_delay()
+
+ static __inline__ void
+ cpu_delay(void)
+ {
+     __asm__ __volatile__(
+         " rep; nop            \n"
+         : : : "memory");
+ }
  #endif     /* __i386__ || __x86_64__ */


***************
*** 715,720 ****
--- 735,748 ----
  #define TAS(lock)        tas(lock)
  #endif     /* TAS */

+ #ifndef HAS_CPU_DELAY
+ #define CPU_DELAY() cpu_delay()
+
+ static __inline__ void
+ cpu_delay(void)
+ {
+ }
+ #endif

  /*
   * Platform-independent out-of-line support routines

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: fork/exec patch: pre-CreateProcess finalization
Next
From: Tom Lane
Date:
Subject: Re: update i386 spinlock for hyperthreading