update i386 spinlock for hyperthreading - Mailing list pgsql-patches
From | Manfred Spraul |
---|---|
Subject | update i386 spinlock for hyperthreading |
Date | |
Msg-id | 3FECB5A8.50708@colorfullife.com Whole thread Raw |
Responses |
Re: update i386 spinlock for hyperthreading
|
List | pgsql-patches |
Hi, Intel recommends to add a special pause instruction into spinlock busy loops. It's necessary for hyperthreading - without it, the cpu can't figure out that a logical thread does no useful work and incorrectly awards lots of execution resources to that thread. Additionally, it's supposed to reduce the time the cpu needs to recover from the (mispredicted) branch after the spinlock was obtained. The attached patch adds a new platform hook and implements it for i386. The new instruction is backward compatible, thus no cpu detection is necessary. Additionally I've increased the number of loops from 100 to 1000 - a 3 GHz Pentium 4 might execute 100 loops faster than a single bus transaction. I don't know if this change is appropriate for all platforms, or if SPINS_PER_DELAY should be made platform specific. Mark did a test run with his dbt-2 benchmark on a 4-way Xeon with HT enabled, and the patch resulted in a 10% performance increase: Before: http://developer.osdl.org/markw/dbt2-pgsql/284/ After: http://developer.osdl.org/markw/dbt2-pgsql/300/ -- Manfred Index: ./src/backend/storage/lmgr/s_lock.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/lmgr/s_lock.c,v retrieving revision 1.22 diff -c -r1.22 s_lock.c *** ./src/backend/storage/lmgr/s_lock.c 23 Dec 2003 22:15:07 -0000 1.22 --- ./src/backend/storage/lmgr/s_lock.c 26 Dec 2003 22:24:52 -0000 *************** *** 76,82 **** * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 100 #define NUM_DELAYS 1000 #define MIN_DELAY_CSEC 1 #define MAX_DELAY_CSEC 100 --- 76,82 ---- * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 1000 #define NUM_DELAYS 1000 #define MIN_DELAY_CSEC 1 #define MAX_DELAY_CSEC 100 *************** *** 111,116 **** --- 111,117 ---- spins = 0; } + CPU_DELAY(); } } Index: ./src/include/storage/s_lock.h =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/include/storage/s_lock.h,v retrieving revision 1.123 diff -c -r1.123 s_lock.h *** ./src/include/storage/s_lock.h 23 Dec 2003 22:15:07 -0000 1.123 --- ./src/include/storage/s_lock.h 26 Dec 2003 22:24:52 -0000 *************** *** 52,57 **** --- 52,66 ---- * in assembly language to execute a hardware atomic-test-and-set * instruction. Equivalent OS-supplied mutex routines could be used too. * + * Additionally, a platform specific delay function can be defined: + * + * void CPU_DELAY(void) + * Notification that the cpu is executing a busy loop. + * + * Some platforms need such an indication. One example are platforms + * that implement SMT, i.e. multiple logical threads that share + * execution resources in one physical cpu. + * * If no system-specific TAS() is available (ie, HAVE_SPINLOCKS is not * defined), then we fall back on an emulation that uses SysV semaphores * (see spin.c). This emulation will be MUCH MUCH slower than a proper TAS() *************** *** 115,120 **** --- 124,140 ---- return (int) _res; } + #define HAS_CPU_DELAY + + #define CPU_DELAY() cpu_delay() + + static __inline__ void + cpu_delay(void) + { + __asm__ __volatile__( + " rep; nop \n" + : : : "memory"); + } #endif /* __i386__ || __x86_64__ */ *************** *** 715,720 **** --- 735,748 ---- #define TAS(lock) tas(lock) #endif /* TAS */ + #ifndef HAS_CPU_DELAY + #define CPU_DELAY() cpu_delay() + + static __inline__ void + cpu_delay(void) + { + } + #endif /* * Platform-independent out-of-line support routines
pgsql-patches by date: