Thread: update i386 spinlock for hyperthreading

update i386 spinlock for hyperthreading

From
Manfred Spraul
Date:
Hi,

Intel recommends to add a special pause instruction into spinlock busy
loops. It's necessary for hyperthreading - without it, the cpu can't
figure out that a logical thread does no useful work and incorrectly
awards lots of execution resources to that thread. Additionally, it's
supposed to reduce the time the cpu needs to recover from the
(mispredicted) branch after the spinlock was obtained.
The attached patch adds a new platform hook and implements it for i386.
The new instruction is backward compatible, thus no cpu detection is
necessary.
Additionally I've increased the number of loops from 100 to 1000 - a 3
GHz Pentium 4 might execute 100 loops faster than a single bus
transaction. I don't know if this change is appropriate for all
platforms, or if SPINS_PER_DELAY should be made platform specific.

Mark did a test run with his dbt-2 benchmark on a 4-way Xeon with HT
enabled, and the patch resulted in a 10% performance increase:
Before:
http://developer.osdl.org/markw/dbt2-pgsql/284/
After:
http://developer.osdl.org/markw/dbt2-pgsql/300/

--
    Manfred
Index: ./src/backend/storage/lmgr/s_lock.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/lmgr/s_lock.c,v
retrieving revision 1.22
diff -c -r1.22 s_lock.c
*** ./src/backend/storage/lmgr/s_lock.c    23 Dec 2003 22:15:07 -0000    1.22
--- ./src/backend/storage/lmgr/s_lock.c    26 Dec 2003 22:24:52 -0000
***************
*** 76,82 ****
       * The select() delays are measured in centiseconds (0.01 sec) because 10
       * msec is a common resolution limit at the OS level.
       */
! #define SPINS_PER_DELAY        100
  #define NUM_DELAYS            1000
  #define MIN_DELAY_CSEC        1
  #define MAX_DELAY_CSEC        100
--- 76,82 ----
       * The select() delays are measured in centiseconds (0.01 sec) because 10
       * msec is a common resolution limit at the OS level.
       */
! #define SPINS_PER_DELAY        1000
  #define NUM_DELAYS            1000
  #define MIN_DELAY_CSEC        1
  #define MAX_DELAY_CSEC        100
***************
*** 111,116 ****
--- 111,117 ----

              spins = 0;
          }
+         CPU_DELAY();
      }
  }

Index: ./src/include/storage/s_lock.h
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/include/storage/s_lock.h,v
retrieving revision 1.123
diff -c -r1.123 s_lock.h
*** ./src/include/storage/s_lock.h    23 Dec 2003 22:15:07 -0000    1.123
--- ./src/include/storage/s_lock.h    26 Dec 2003 22:24:52 -0000
***************
*** 52,57 ****
--- 52,66 ----
   *    in assembly language to execute a hardware atomic-test-and-set
   *    instruction.  Equivalent OS-supplied mutex routines could be used too.
   *
+  *    Additionally, a platform specific delay function can be defined:
+  *
+  *    void CPU_DELAY(void)
+  *        Notification that the cpu is executing a busy loop.
+  *
+  *     Some platforms need such an indication. One example are platforms
+  *     that implement SMT, i.e. multiple logical threads that share
+  *     execution resources in one physical cpu.
+  *
   *    If no system-specific TAS() is available (ie, HAVE_SPINLOCKS is not
   *    defined), then we fall back on an emulation that uses SysV semaphores
   *    (see spin.c).  This emulation will be MUCH MUCH slower than a proper TAS()
***************
*** 115,120 ****
--- 124,140 ----
      return (int) _res;
  }

+ #define HAS_CPU_DELAY
+
+ #define CPU_DELAY()    cpu_delay()
+
+ static __inline__ void
+ cpu_delay(void)
+ {
+     __asm__ __volatile__(
+         " rep; nop            \n"
+         : : : "memory");
+ }
  #endif     /* __i386__ || __x86_64__ */


***************
*** 715,720 ****
--- 735,748 ----
  #define TAS(lock)        tas(lock)
  #endif     /* TAS */

+ #ifndef HAS_CPU_DELAY
+ #define CPU_DELAY() cpu_delay()
+
+ static __inline__ void
+ cpu_delay(void)
+ {
+ }
+ #endif

  /*
   * Platform-independent out-of-line support routines

Re: update i386 spinlock for hyperthreading

From
Tom Lane
Date:
Manfred Spraul <manfred@colorfullife.com> writes:
> Intel recommends to add a special pause instruction into spinlock busy
> loops. It's necessary for hyperthreading - without it, the cpu can't
> figure out that a logical thread does no useful work and incorrectly
> awards lots of execution resources to that thread. Additionally, it's
> supposed to reduce the time the cpu needs to recover from the
> (mispredicted) branch after the spinlock was obtained.

Don't you have to put it in a specific place in the loop to make that
work?  If not, why not?  I doubt that rep;nop is magic enough to
recognize the loop that will be generated from s_lock()'s code.

My guess is that it'd be more useful to insert the rep;nop into the
failure branch of the TAS macro and forget about the separate CPU_DELAY
construct.  This would allow you to control where exactly rep;nop
appears relative to the xchgb.

> Additionally I've increased the number of loops from 100 to 1000

I think this change is almost certainly counterproductive; for any
platform other than the Xeon, remove "almost".

> + #ifndef HAS_CPU_DELAY
> + #define CPU_DELAY() cpu_delay()
> +
> + static __inline__ void
> + cpu_delay(void)
> + {
> + }
> + #endif

This breaks every non-gcc compiler in the world (or at least all those
that don't recognize __inline__).  If you really want to keep CPU_DELAY,
consider

    #ifndef CPU_DELAY
    #define CPU_DELAY()
    #endif

but as stated above, I'm dubious that the bottom of the s_lock loop
is the place to be adding anything anyway.

            regards, tom lane

Re: update i386 spinlock for hyperthreading

From
Manfred Spraul
Date:
Tom Lane wrote:

>Manfred Spraul <manfred@colorfullife.com> writes:
>
>
>>Intel recommends to add a special pause instruction into spinlock busy
>>loops. It's necessary for hyperthreading - without it, the cpu can't
>>figure out that a logical thread does no useful work and incorrectly
>>awards lots of execution resources to that thread. Additionally, it's
>>supposed to reduce the time the cpu needs to recover from the
>>(mispredicted) branch after the spinlock was obtained.
>>
>>
>
>Don't you have to put it in a specific place in the loop to make that
>work?  If not, why not?  I doubt that rep;nop is magic enough to
>recognize the loop that will be generated from s_lock()'s code.
>
>
Rep;nop is just a short delay - that's all. It means that the cpu
pipelines have a chance to drain, and that the other thread gets enough
cpu resources. Below is the full instruction documentation, from the
latest ia32 doc set from Intel:
<<<
Improves the performance of spin-wait loops. When executing a  spin-wait
loop,  a Pentium 4 or Intel Xeon processor suffers a severe performance
penalty when exiting the loop because it detects a possible memory order
violation. The PAUSE instruction provides a hint to the processor that
the code sequence is a spin-wait loop. The processor uses this hint to
avoid the memory order violation in most situations, which greatly
improves processor performance. For this reason, it is recommended that
a PAUSE instruction be placed in all spin-wait loops. An additional
function of the PAUSE instruction is to reduce the power consumed by a
Pentium 4 processor while executing a spin loop. The Pentium 4 processor
can execute a spin-wait loop extremely quickly, causing the processor to
consume a lot of power while it waits for the resource it is spinning on
to become available. Inserting a pause instruction in a spin-wait loop
greatly reduces the processor s power consumption. This instruction was
introduced in the Pentium 4 processors, but is backward compatible with
all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction
operates like a NOP instruction. The Pentium 4 and Intel Xeon processors
implement the PAUSE instruction as a pre-defined delay. The delay is
finite and can be zero for some processors. This instruction does not
change the architectural state of the processor (that is, it performs
essentially a delaying noop operation).
<<<

I think a separate function is better than adding it into TAS: if it's
part of tas, then it would automatically be included by every
SpinLockAcquire call - unnecessary .text bloat. Additionally, there
might be other busy loops, in addition to TAS, that could use a delay
function.

I'll post a new patch that doesn't rely on __inline__ in the i386
independant part.

--
    Manfred


Re: update i386 spinlock for hyperthreading

From
Tom Lane
Date:
Manfred Spraul <manfred@colorfullife.com> writes:
> Tom Lane wrote:
>> Don't you have to put it in a specific place in the loop to make that
>> work?  If not, why not?
>>
> Rep;nop is just a short delay - that's all.

That view seems to me to be directly contradicted by this statement:

> The PAUSE instruction provides a hint to the processor that
> the code sequence is a spin-wait loop. The processor uses this hint to
> avoid the memory order violation in most situations, which greatly
> improves processor performance.

It's not apparent to me how a short delay translates into avoiding a
memory order violation (possibly some docs on what that means exactly
might help...).  I suspect strongly that there needs to be some near
proximity between the PAUSE instruction and the lock-test instruction
for this to work as advertised.  It would help if Intel were less coy
about what the instruction really does.

> This instruction does not change the architectural state of the
> processor (that is, it performs essentially a delaying noop
> operation).

This can be rephrased as "we're not telling you what this instruction
really does, because its interesting effects are below the level of the
instruction set architecture".  Great.  How are we supposed to know
how to use it?

> I think a separate function is better than adding it into TAS: if it's
> part of tas, then it would automatically be included by every
> SpinLockAcquire call - unnecessary .text bloat.

Why do you think it's unnecessary?  One thing that I find particularly
vague in the quoted documentation is the statement that the PAUSE
instruction is needed to avoid a delay when *exiting* the spin-wait
loop.  Doesn't this mean that a PAUSE is needed in the success path
when the first TAS succeeds (i.e, the normal no-contention path)?
If not, why not?  If so, does it go before or after the lock
instruction?

Also, if the principal effect is a "short delay", do we really need it
at all considering that our inner loop in s_lock is rather more than
an "xchgb" followed by a conditional branch?  There will be time for
the write queue to drain while we're incrementing and testing our
spin counter (which I trust is in a register...).

The reason I'm so full of questions is that I spent some time several
days ago looking at exactly this issue, and came away with only the
conclusion that I had to find some more-detailed documentation before
I could figure out what we should do about the spinlocks for Xeons.
You have not convinced me that you know more about the issue than I do.
A "10% speedup" is nice, but how do we know that that's what we should
expect to get?  Maybe there's a lot more to be won by doing it correctly
(for some value of "correctly").

            regards, tom lane

Re: update i386 spinlock for hyperthreading

From
Manfred Spraul
Date:
Tom Lane wrote:

>Manfred Spraul <manfred@colorfullife.com> writes:
>
>
>>Tom Lane wrote:
>>
>>
>>>Don't you have to put it in a specific place in the loop to make that
>>>work?  If not, why not?
>>>
>>>
>>>
>>Rep;nop is just a short delay - that's all.
>>
>>
>
>That view seems to me to be directly contradicted by this statement:
>
>
>
>>The PAUSE instruction provides a hint to the processor that
>>the code sequence is a spin-wait loop. The processor uses this hint to
>>avoid the memory order violation in most situations, which greatly
>>improves processor performance.
>>
>>
>
>It's not apparent to me how a short delay translates into avoiding a
>memory order violation (possibly some docs on what that means exactly
>might help...).  I suspect strongly that there needs to be some near
>proximity between the PAUSE instruction and the lock-test instruction
>for this to work as advertised.  It would help if Intel were less coy
>about what the instruction really does.
>
>
My guess: Pentium 4 cpu support something like 250 uops in flight - it
will have a dozend of the spinlock loops in it's pipeline. When the
spinlock is released, it must figure out which of the loops should get
it, and gets lost. My guess is that rep;nop delays the cpu buy at least
100 cpu ticks, and thus the pipeline will be empty before it proceeds. I
don't have a Pentium 4, and the HP testdrive is down. Someone around who
could run my test app?

>
>
>>This instruction does not change the architectural state of the
>>processor (that is, it performs essentially a delaying noop
>>operation).
>>
>>
>
>This can be rephrased as "we're not telling you what this instruction
>really does, because its interesting effects are below the level of the
>instruction set architecture".  Great.  How are we supposed to know
>how to use it?
>
>
There was a w_spinlock.pdf document with reference code. google still
finds it, but the links are dead :-(

>>I think a separate function is better than adding it into TAS: if it's
>>part of tas, then it would automatically be included by every
>>SpinLockAcquire call - unnecessary .text bloat.
>>
>>
>
>Why do you think it's unnecessary?  One thing that I find particularly
>vague in the quoted documentation is the statement that the PAUSE
>instruction is needed to avoid a delay when *exiting* the spin-wait
>loop.  Doesn't this mean that a PAUSE is needed in the success path
>when the first TAS succeeds (i.e, the normal no-contention path)?
>
IIRC: No.

>If not, why not?  If so, does it go before or after the lock
>instruction?
>
>
Neither: somewhere in the failure path.

>Also, if the principal effect is a "short delay", do we really need it
>at all considering that our inner loop in s_lock is rather more than
>an "xchgb" followed by a conditional branch?  There will be time for
>the write queue to drain while we're incrementing and testing our
>spin counter (which I trust is in a register...).
>
>The reason I'm so full of questions is that I spent some time several
>days ago looking at exactly this issue, and came away with only the
>conclusion that I had to find some more-detailed documentation before
>I could figure out what we should do about the spinlocks for Xeons.
>
I'll try to find some more docs and post links.

The 2nd thing I would change is to add a nonatomic test in the slow
path: locked instructions generate lots of bus traffic, and that's a
waste of resources.

Another question: regardless of the placement of rep;nop - 10% speedup
means that the postgres spends far too much time in the spinlock code.
I've looked at the oprofile dumps, and something like 1.2% of the total
cpu time is spent it the TAS macro in LWLockAcquire. That's the hottest
instruction in the whole profile, it eats more cpu cycles than the
memcpy() calls that transfer data to/from kernel.
Is there an easy way find out which LWLock is contended?
--
    Manfred
/*
 * skel.cpp. skeleton for rdtsc benchmarks
 *
 * Copyright (C) 1999, 2001 by Manfred Spraul.
 *    All rights reserved except the rights granted by the GPL.
 *
 * Redistribution of this file is permitted under the terms of the GNU
 * General Public License (GPL) version 2 or later.
 * $Header: /pub/home/manfred/cvs-tree/timetest/rep_nop.cpp,v 1.1 2001/04/07 19:38:33 manfred Exp $
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <getopt.h>

// disable local interrupts during benchmark
#undef USE_CLI

// define a cache flushing function
#undef CACHE_FLUSH

#ifdef USE_CLI
#include <sys/io.h>
#define CLI    "cli\n\t"
#define STI    "sti\n\t"
#else
#define CLI
#define STI
#define iopl(a)    do { } while(0)
#endif

// Intel recommends that a serializing instruction
// should be called before and after rdtsc.
// CPUID is a serializing instruction.
// ".align 128:" P 4 L2 cache line size
#define read_rdtsc_before(time)        \
    __asm__ __volatile__(        \
        ".align 128\n\t"    \
        "xor %%eax,%%eax\n\t"    \
        CLI            \
        "cpuid\n\t"        \
        "rdtsc\n\t"        \
        "mov %%eax,(%0)\n\t"    \
        "mov %%edx,4(%0)\n\t"    \
        "xor %%eax,%%eax\n\t"    \
        "cpuid\n\t"        \
        : /* no output */    \
        : "S"(&time)        \
        : "eax", "ebx", "ecx", "edx", "memory")

#define read_rdtsc_after(time)        \
    __asm__ __volatile__(        \
        "xor %%eax,%%eax\n\t"    \
        "cpuid\n\t"        \
        "rdtsc\n\t"        \
        "mov %%eax,(%0)\n\t"    \
        "mov %%edx,4(%0)\n\t"    \
        "xor %%eax,%%eax\n\t"    \
        "cpuid\n\t"        \
        STI            \
        : /* no output */    \
        : "S"(&time)        \
        : "eax", "ebx", "ecx", "edx", "memory")

#define BUILD_TESTFNC(name, text, instructions) \
void name##_dummy(void)                \
{                        \
    __asm__ __volatile__(            \
        ".align 4096\n\t"        \
        "xor %%eax, %%eax\n\t"        \
        : : : "eax");            \
}                        \
static unsigned long name##_best = 1024*1024*1024; \
\
static void name(void) \
{ \
    unsigned long long time; \
    unsigned long long time2; \
 \
    read_rdtsc_before(time); \
    instructions; \
    read_rdtsc_after(time2); \
    if(time2-time < name##_best) { \
        printf( text ":\t%10Ld ticks; \n", \
            time2-time-zerotest_best); \
        name##_best = time2-time; \
    } \
}

void filler(void)
{
static int i = 3;
static int j;
    j = i*i;
}

#define DO_3(x) \
    do { x; x; x; } while(0)

#define DO_10(x) \
    do { x; x; x; x; x; x; x; x; x; x;} while(0)

#define DO_50(x) \
    do { DO_10(x); DO_10(x);DO_10(x); DO_10(x);DO_10(x);} while(0)


#define DO_T(y) do { \
    DO_3(filler()); \
    y; \
    DO_3(filler());} while(0)

#ifdef CACHE_FLUSH
#define DRAIN_SZ    (4*1024*1024)
int other[3*DRAIN_SZ] __attribute ((aligned (4096)));
static inline void drain_cache(void)
{
    int i;
    for(i=0;i<DRAIN_SZ;i++) other[DRAIN_SZ+i]=0;
    for(i=0;i<DRAIN_SZ;i++) if(other[DRAIN_SZ+i]!=0) break;
}
#else
static inline void drain_cache(void)
{
}
#endif

#define DO_TEST(x) \
    do { \
        int i; \
        for(i=0;i<5000;i++) \
            x; \
    } while(0)

//////////////////////////////////////////////////////////////////////////////

#define REP_NOP()    __asm__ __volatile__ ("rep;nop\n\t": : : "memory");
#define NOP()        __asm__ __volatile__ ("nop\n\t": : : "memory");

BUILD_TESTFNC(zerotest,"zerotest", DO_T((void)0));
BUILD_TESTFNC(rnop,"rep nop", DO_T(REP_NOP()));
BUILD_TESTFNC(nop,"nop", DO_T(NOP()));

//////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////
int main()
{
    if(geteuid() == 0) {
        int res = nice(-20);
        if(res < 0) {
            perror("nice(-20)");
            return 1;
        }
        printf("MOVETEST, reniced to (-20).\n");
    } else
    {
        printf("MOVETEST called by non-superuser, running with normal priority.\n");
    }
    for(;;) {
        DO_TEST(zerotest());
        DO_TEST(rnop());
        DO_TEST(nop());
    }
    return 0;
}
Index: ./src/backend/storage/lmgr/s_lock.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/lmgr/s_lock.c,v
retrieving revision 1.22
diff -u -r1.22 s_lock.c
--- ./src/backend/storage/lmgr/s_lock.c    23 Dec 2003 22:15:07 -0000    1.22
+++ ./src/backend/storage/lmgr/s_lock.c    27 Dec 2003 10:28:54 -0000
@@ -76,7 +76,7 @@
      * The select() delays are measured in centiseconds (0.01 sec) because 10
      * msec is a common resolution limit at the OS level.
      */
-#define SPINS_PER_DELAY        100
+#define SPINS_PER_DELAY        1000
 #define NUM_DELAYS            1000
 #define MIN_DELAY_CSEC        1
 #define MAX_DELAY_CSEC        100
@@ -86,7 +86,7 @@
     int            cur_delay = MIN_DELAY_CSEC;
     struct timeval delay;

-    while (TAS(lock))
+    while (!S_LOCK_FREE(lock) || TAS(lock))
     {
         if (++spins > SPINS_PER_DELAY)
         {
@@ -111,6 +111,7 @@

             spins = 0;
         }
+        CPU_DELAY();
     }
 }

Index: ./src/include/storage/s_lock.h
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/include/storage/s_lock.h,v
retrieving revision 1.123
diff -u -r1.123 s_lock.h
--- ./src/include/storage/s_lock.h    23 Dec 2003 22:15:07 -0000    1.123
+++ ./src/include/storage/s_lock.h    27 Dec 2003 10:28:55 -0000
@@ -52,6 +52,15 @@
  *    in assembly language to execute a hardware atomic-test-and-set
  *    instruction.  Equivalent OS-supplied mutex routines could be used too.
  *
+ *    Additionally, a platform specific delay function can be defined:
+ *
+ *    void CPU_DELAY(void)
+ *        Notification that the cpu is executing a busy loop.
+ *
+ *     Some platforms need such an indication. One example are platforms
+ *     that implement SMT, i.e. multiple logical threads that share
+ *     execution resources in one physical cpu.
+ *
  *    If no system-specific TAS() is available (ie, HAVE_SPINLOCKS is not
  *    defined), then we fall back on an emulation that uses SysV semaphores
  *    (see spin.c).  This emulation will be MUCH MUCH slower than a proper TAS()
@@ -115,6 +124,17 @@
     return (int) _res;
 }

+#define HAS_CPU_DELAY
+
+#define CPU_DELAY()    cpu_delay()
+
+static __inline__ void
+cpu_delay(void)
+{
+    __asm__ __volatile__(
+        " rep; nop            \n"
+        : : : "memory");
+}
 #endif     /* __i386__ || __x86_64__ */


@@ -715,6 +735,9 @@
 #define TAS(lock)        tas(lock)
 #endif     /* TAS */

+#ifndef HAS_CPU_DELAY
+#define CPU_DELAY()    do { } while(0)
+#endif

 /*
  * Platform-independent out-of-line support routines

Re: update i386 spinlock for hyperthreading

From
ohp@pyrenet.fr
Date:
I have a bi-Xeon 2.6G hyperthreaded if it helps... feel free

Regards
On Sat, 27 Dec 2003, Manfred Spraul wrote:

> Date: Sat, 27 Dec 2003 11:34:16 +0100
> From: Manfred Spraul <manfred@colorfullife.com>
> To: Tom Lane <tgl@sss.pgh.pa.us>
> Cc: PostgreSQL-patches <pgsql-patches@postgresql.org>
> Subject: Re: [PATCHES] update i386 spinlock for hyperthreading
>
> Tom Lane wrote:
>
> >Manfred Spraul <manfred@colorfullife.com> writes:
> >
> >
> >>Tom Lane wrote:
> >>
> >>
> >>>Don't you have to put it in a specific place in the loop to make that
> >>>work?  If not, why not?
> >>>
> >>>
> >>>
> >>Rep;nop is just a short delay - that's all.
> >>
> >>
> >
> >That view seems to me to be directly contradicted by this statement:
> >
> >
> >
> >>The PAUSE instruction provides a hint to the processor that
> >>the code sequence is a spin-wait loop. The processor uses this hint to
> >>avoid the memory order violation in most situations, which greatly
> >>improves processor performance.
> >>
> >>
> >
> >It's not apparent to me how a short delay translates into avoiding a
> >memory order violation (possibly some docs on what that means exactly
> >might help...).  I suspect strongly that there needs to be some near
> >proximity between the PAUSE instruction and the lock-test instruction
> >for this to work as advertised.  It would help if Intel were less coy
> >about what the instruction really does.
> >
> >
> My guess: Pentium 4 cpu support something like 250 uops in flight - it
> will have a dozend of the spinlock loops in it's pipeline. When the
> spinlock is released, it must figure out which of the loops should get
> it, and gets lost. My guess is that rep;nop delays the cpu buy at least
> 100 cpu ticks, and thus the pipeline will be empty before it proceeds. I
> don't have a Pentium 4, and the HP testdrive is down. Someone around who
> could run my test app?
>
> >
> >
> >>This instruction does not change the architectural state of the
> >>processor (that is, it performs essentially a delaying noop
> >>operation).
> >>
> >>
> >
> >This can be rephrased as "we're not telling you what this instruction
> >really does, because its interesting effects are below the level of the
> >instruction set architecture".  Great.  How are we supposed to know
> >how to use it?
> >
> >
> There was a w_spinlock.pdf document with reference code. google still
> finds it, but the links are dead :-(
>
> >>I think a separate function is better than adding it into TAS: if it's
> >>part of tas, then it would automatically be included by every
> >>SpinLockAcquire call - unnecessary .text bloat.
> >>
> >>
> >
> >Why do you think it's unnecessary?  One thing that I find particularly
> >vague in the quoted documentation is the statement that the PAUSE
> >instruction is needed to avoid a delay when *exiting* the spin-wait
> >loop.  Doesn't this mean that a PAUSE is needed in the success path
> >when the first TAS succeeds (i.e, the normal no-contention path)?
> >
> IIRC: No.
>
> >If not, why not?  If so, does it go before or after the lock
> >instruction?
> >
> >
> Neither: somewhere in the failure path.
>
> >Also, if the principal effect is a "short delay", do we really need it
> >at all considering that our inner loop in s_lock is rather more than
> >an "xchgb" followed by a conditional branch?  There will be time for
> >the write queue to drain while we're incrementing and testing our
> >spin counter (which I trust is in a register...).
> >
> >The reason I'm so full of questions is that I spent some time several
> >days ago looking at exactly this issue, and came away with only the
> >conclusion that I had to find some more-detailed documentation before
> >I could figure out what we should do about the spinlocks for Xeons.
> >
> I'll try to find some more docs and post links.
>
> The 2nd thing I would change is to add a nonatomic test in the slow
> path: locked instructions generate lots of bus traffic, and that's a
> waste of resources.
>
> Another question: regardless of the placement of rep;nop - 10% speedup
> means that the postgres spends far too much time in the spinlock code.
> I've looked at the oprofile dumps, and something like 1.2% of the total
> cpu time is spent it the TAS macro in LWLockAcquire. That's the hottest
> instruction in the whole profile, it eats more cpu cycles than the
> memcpy() calls that transfer data to/from kernel.
> Is there an easy way find out which LWLock is contended?
> --
>     Manfred
>

--
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou           +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

Re: update i386 spinlock for hyperthreading

From
Tom Lane
Date:
Manfred Spraul <manfred@colorfullife.com> writes:
> My guess: Pentium 4 cpu support something like 250 uops in flight - it
> will have a dozend of the spinlock loops in it's pipeline. When the
> spinlock is released, it must figure out which of the loops should get
> it, and gets lost. My guess is that rep;nop delays the cpu buy at least
> 100 cpu ticks, and thus the pipeline will be empty before it proceeds.

After digging some more in Intel's documentation, it seems that indeed
PAUSE is defined to delay just long enough to empty the pipeline.  So it
doesn't really matter where you put it in the wait loop, and there is no
point in inserting it in the success path; that answers my concerns from
before.

> There was a w_spinlock.pdf document with reference code. google still
> finds it, but the links are dead :-(

I was able to find it as a link from another application note at Intel's
documentation site.  Try going to
http://appzone.intel.com/literature/index.asp and searching for AP-949.


Anyway, I've committed your patch with some changes.

> The 2nd thing I would change is to add a nonatomic test in the slow
> path: locked instructions generate lots of bus traffic, and that's a
> waste of resources.

Agreed, but I did not like the way you did it; this concern does not
necessarily apply to all processors, and since we are not using
S_LOCK_FREE at all, it's dubious that it's correctly implemented
everywhere.  I modified the IA32 TAS() macro instead.

BTW, I noticed a lot of concern in the Intel app notes about reserving
64 or even 128 bytes for each spinlock to avoid cache line conflicts.
That seems excessive to me (we use a lot of spinlocks for buffers), but
perhaps it is worth looking into.


> Is there an easy way find out which LWLock is contended?

Not from oprofile output, as far as I can think.  I've suspected for
some time that the BufMgrLock is a major bottleneck, but have no proof.

            regards, tom lane

Re: update i386 spinlock for hyperthreading

From
Manfred Spraul
Date:
Tom Lane wrote:

>Anyway, I've committed your patch with some changes.
>
>
Thanks.

>BTW, I noticed a lot of concern in the Intel app notes about reserving
>64 or even 128 bytes for each spinlock to avoid cache line conflicts.
>That seems excessive to me (we use a lot of spinlocks for buffers), but
>perhaps it is worth looking into.
>
This recommendation usually ignored in the Linux kernel.  A few very hot
spinlocks have an exclusive cacheline, but most don't.

>>Is there an easy way find out which LWLock is contended?
>>
>>
>
>Not from oprofile output, as far as I can think.  I've suspected for
>some time that the BufMgrLock is a major bottleneck, but have no proof.
>
>
I'll try to write a patch that dumps the LWLock usage and ask mark to
run it.

--
    Manfred


Re: update i386 spinlock for hyperthreading

From
Manfred Spraul
Date:
Tom Lane wrote:

>>Is there an easy way find out which LWLock is contended?
>>
>>
>
>Not from oprofile output, as far as I can think.  I've suspected for
>some time that the BufMgrLock is a major bottleneck, but have no proof.
>
>
Mark ran a DBT-2 testrun with the attached statistics patch applied: It
collects stats about all lightweight locks and dumps them during
shutdown. The hottest locks are

Lock                         Acquire   %contention   sleep calls
8(WALInsertLock)       8679205 0.030410        263934
1(LockMgrLock)       64089418        0.079783        5113215
5(SInvalLock)       68396470        0.001298        88812
0(BufMgrLock)       246307425       0.120293        29629089

The lock numbers are from 7.4, i.e. without the patch that removes
ShmemIndexLock. I've check that 8 is really WALInsertLock in the
assembly output.

The scary part from the system perspective are the 35 million context
switches that were generated by the BufMgrLock and the LockMgrLock. I
remember there were patches that tried other algorithms instead of the
simple LRU for the buffer manager. Has anyone tried to change the
locking of the buffer manager?

The effect of padding the lightweight locks to a full cacheline appears
to be negligable: With the padding, there were around 4 million
performance monitor hits on the 'lock xchg' instructions. Without it
(test run 300), there were 4.2 million hits.

The complete data is at

http://developer.osdl.org/markw/dbt2-pgsql/303/

The db log with the lock stats is at
http://developer.osdl.org/markw/dbt2-pgsql/303/db/log

(Warning: 6.9 MB)

--
    Manfred
Index: src/backend/storage/lmgr/lwlock.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/lmgr/lwlock.c,v
retrieving revision 1.19
diff -u -r1.19 lwlock.c
--- src/backend/storage/lmgr/lwlock.c    20 Dec 2003 17:31:21 -0000    1.19
+++ src/backend/storage/lmgr/lwlock.c    27 Dec 2003 22:51:36 -0000
@@ -36,6 +36,11 @@
     PGPROC       *head;            /* head of list of waiting PGPROCs */
     PGPROC       *tail;            /* tail of list of waiting PGPROCs */
     /* tail is undefined when head is NULL */
+    unsigned long long stat_acquire_total;
+    unsigned long long stat_acquire_fail;
+    unsigned long long stat_release_total;
+    unsigned long long stat_release_wakeup;
+    int        fill[20];
 } LWLock;

 /*
@@ -159,6 +164,10 @@
         lock->shared = 0;
         lock->head = NULL;
         lock->tail = NULL;
+        lock->stat_acquire_total = 0;
+        lock->stat_acquire_fail = 0;
+        lock->stat_release_total = 0;
+        lock->stat_release_wakeup = 0;
     }

     /*
@@ -245,6 +254,10 @@
         if (retry)
             lock->releaseOK = true;

+        lock->stat_acquire_total++;
+        if (retry)
+            lock->stat_acquire_fail++;
+
         /* If I can get the lock, do so quickly. */
         if (mode == LW_EXCLUSIVE)
         {
@@ -440,6 +453,7 @@
         Assert(lock->shared > 0);
         lock->shared--;
     }
+    lock->stat_release_total++;

     /*
      * See if I need to awaken any waiters.  If I released a non-last
@@ -477,6 +491,8 @@
         }
     }

+    if (head)
+        lock->stat_release_wakeup++;
     /* We are done updating shared state of the lock itself. */
     SpinLockRelease_NoHoldoff(&lock->mutex);

@@ -517,5 +533,19 @@
         HOLD_INTERRUPTS();        /* match the upcoming RESUME_INTERRUPTS */

         LWLockRelease(held_lwlocks[num_held_lwlocks - 1]);
+    }
+}
+
+void LWLockPrintStats(void);
+void
+LWLockPrintStats(void)
+{
+    int i;
+    for (i=0;i<LWLockCounter[0];i++) {
+        volatile LWLock *lock = LWLockArray + i;
+        elog(LOG, "Lock %d): acquire_total %Ld acquire_fail %Ld release_total %Ld release_wakeup %Ld\n",
+             i,
+             lock->stat_acquire_total, lock->stat_acquire_fail,
+             lock->stat_release_total, lock->stat_release_wakeup);
     }
 }
Index: src/backend/postmaster/postmaster.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/postmaster/postmaster.c,v
retrieving revision 1.353
diff -u -r1.353 postmaster.c
--- src/backend/postmaster/postmaster.c    25 Dec 2003 03:52:51 -0000    1.353
+++ src/backend/postmaster/postmaster.c    27 Dec 2003 22:51:38 -0000
@@ -1701,7 +1701,7 @@
     errno = save_errno;
 }

-
+void LWLockPrintStats(void);

 /*
  * pmdie -- signal handler for processing various postmaster signals.
@@ -1733,6 +1733,7 @@
             Shutdown = SmartShutdown;
             ereport(LOG,
                     (errmsg("received smart shutdown request")));
+            LWLockPrintStats();
             if (DLGetHead(BackendList)) /* let reaper() handle this */
                 break;

@@ -1766,6 +1767,7 @@
                 break;
             ereport(LOG,
                     (errmsg("received fast shutdown request")));
+            LWLockPrintStats();
             if (DLGetHead(BackendList)) /* let reaper() handle this */
             {
                 Shutdown = FastShutdown;
@@ -1812,6 +1814,7 @@
                 kill(BgWriterPID, SIGQUIT);
             ereport(LOG,
                     (errmsg("received immediate shutdown request")));
+            LWLockPrintStats();
             if (ShutdownPID > 0)
                 kill(ShutdownPID, SIGQUIT);
             if (StartupPID > 0)

Re: update i386 spinlock for hyperthreading

From
Bruce Momjian
Date:
Manfred Spraul wrote:
> The scary part from the system perspective are the 35 million context
> switches that were generated by the BufMgrLock and the LockMgrLock. I
> remember there were patches that tried other algorithms instead of the
> simple LRU for the buffer manager. Has anyone tried to change the
> locking of the buffer manager?

CVS HEAD already has an Adaptive Replacement Cache (ARC) for buffer
replacement.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Some Documentation Changes

From
Christopher Browne
Date:
1.  In keeping with the recent discussion that there should be more
said about views, stored procedures, and triggers, in the tutorial, I
have added a bit of verbiage to that end.

2.  Some formatting changes to the datetime discussion, as well as
addition of a citation of a relevant book on calendars.

Index: advanced.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/advanced.sgml,v
retrieving revision 1.38
diff -c -u -r1.38 advanced.sgml
--- advanced.sgml    29 Nov 2003 19:51:36 -0000    1.38
+++ advanced.sgml    30 Dec 2003 01:58:24 -0000
@@ -65,10 +65,24 @@

    <para>
     Views can be used in almost any place a real table can be used.
-    Building views upon other views is not uncommon.
+    Building views upon other views is not uncommon.  You may cut down
+    on the difficulty of building complex queries by constructing them
+    in smaller, easier-to-verify pieces, using views.  Views may be
+    used to reveal specific table columns to users that legitimately
+    need access to some of the data, but who shouldn't be able to look
+    at the whole table.
    </para>
-  </sect1>

+   <para>
+    Views differ from <quote> real tables </quote> in that they are
+    not, by default, updatable.  If they join together several tables,
+    it may be troublesome to update certain columns since the
+    <emphasis>real</emphasis> update that must take place requires
+    identifying the relevant rows in the source tables.  This is
+    discussed further in <xref linkend="rules-views-update">.
+    </para>
+
+  </sect1>

   <sect1 id="tutorial-fk">
    <title>Foreign Keys</title>
@@ -387,6 +401,169 @@
    </para>
   </sect1>

+  <sect1 id="tutorial-storedprocs">
+   <title> Stored Procedures </title>
+
+   <indexterm zone="tutorial-storedprocs">
+     <primary>stored procedures</primary>
+   </indexterm>
+
+   <para> Stored procedures are code that runs inside the database
+   system.  Numerous languages may be used to implement functions and
+   procedures; most built-in code is implemented in C.  The
+   <quote>basic</quote> loadable procedural language for
+   <productname>PostgreSQL</productname> is <xref linkid="plpgsql">.
+   Numerous other languages may also be used, including <xref
+   linkid="plperl">, <xref linkid="pltcl">, and <xref
+   linkid="plpython">.
+   </para>
+
+   <para> There are several ways that stored procedures are really
+   helpful:
+
+   <itemizedlist>
+
+   <listitem><para> To centralize data validation code into the
+   database </para>
+
+   <para> Your system may use client software written in several
+   languages, perhaps with a <quote>web application</quote>
+   implemented in PHP, a <quote>server application</quote> implemented
+   in Java, and a <quote> report writer</quote> implemented in Perl.
+   In the absence of stored procedures, you will likely find that data
+   validation code must be implemented multiple times, in multiple
+   languages, once for each application.</para>
+
+   <para> By implementing data validation in stored procedures,
+   running in the database, it can behave uniformly for all these
+   systems, and you do not need to worry about synchronizing
+   validation procedures across the languages.</para>
+
+   </listitem>
+
+   <listitem><para> Reducing round trips between client and server
+   </para>
+
+   <para>A stored procedure may submit multiple queries, looking up
+   information and adding in links to additional tables.  This takes
+   place without requiring that the client submit multiple queries,
+   and without requiring any added network traffic.
+   </para>
+
+   <para> As a matter of course, the queries share a single
+   transaction context, and there may also be savings in the
+   evaluation of query plans, that will be similar between invocations
+   of a given stored procedure.  </para></listitem>
+
+   <listitem><para> To simplify queries. </para>
+
+   <para> For instance, if you are commonly checking the TLD on domain
+   names, you might create a stored procedure for this purpose, and so
+   be able to use queries such as <command> select domain, tld(domain)
+   from domains; </command> instead of having to put verbose code
+   using <function>substr()</function> into each query.
+   </para>
+
+   <para> It is particularly convenient to use scripting languages
+   like Perl, Tcl, and Python to <quote>grovel through strings</quote>
+   since they are designed for <quote>text processing.</quote></para>
+
+   <para> The binding to the R statistical language allows
+   implementing complex statistical queries inside the database,
+   instead of having to draw the data out.
+   </listitem>
+
+   <listitem><para> Increasing the level of abstraction</para>
+
+   <para> If data is accessed exclusively through stored procedures,
+   then the structures of tables may be changed without there needing
+   to be any visible change in the API used by programmers.  In some
+   systems, users are <emphasis>only</emphasis> allowed access to
+   stored procedures to update data, and cannot do direct updates to
+   tables.
+   </para>
+
+   </listitem>
+
+   </itemizedlist>
+   </para>
+
+   <para> These benefits build on one another: careful use of stored
+   procedures can simultaneously improve reliability and performance,
+   whilst simplifying database access code and improving portability
+   across client platforms and languages.  For instance, consider that
+   a stored procedure can cheaply query tables in the database to
+   validate the correctness of data provided as input. </para>
+
+   <para> Instead of requiring a whole series of queries to create an
+   object, and to look up parent/subsidiary objects to link it to, a
+   stored procedure can do all of this efficiently in the database
+   server, improving performance, and eliminating whole classes of
+   errors. </para>
+
+  </sect1>
+
+  <sect1 id="tutorial-triggers">
+   <title> Triggers </title>
+
+   <indexterm zone="tutorial-triggers">
+     <primary>triggers</primary>
+   </indexterm>
+
+   <para> Triggers allow running a function either before or after
+   update (<command>INSERT</command>, <command>DELETE</command>,
+   <command>UPDATE</command>) operations, which can allow you to do
+   some very clever things. </para>
+
+   <itemizedlist>
+
+   <listitem><para> Data Validation  </para>
+
+   <para> Instead of explicitly coding validation checks as part of a
+   stored procedure, they may be introduced as <command>BEFORE</command>
+   triggers.  The trigger function checks the input values, raising an
+   exception if it finds invalid input.</para>
+
+   <para> Note that this is how foreign key checks are implemented in
+   <productname>PostgreSQL</productname>; when you define a foreign
+   key, you will see a message similar to the following:
+<screen>
+NOTICE:  CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s)
+</screen></para>
+
+   <para> In some cases, it may be appropriate for a trigger function
+   to insert data in order to <emphasis>make</emphasis> the input valid.  For
+   instance, if a newly created object needs a status code in a status
+   table, the trigger might automatically do that.</para>
+   </listitem>
+
+   <listitem><para> Audit logs </para>
+
+   <para> One may use <command>AFTER</command> triggers to monitor updates to
+   vital tables, and <command>INSERT</command> entries into log tables to
+   provide a more permanent record of those updates.  </para>
+   </listitem>
+
+   <listitem><para> Replication </para>
+
+   <para> The <application>RServ</application> replication system uses
+   <command>AFTER</command> triggers to track which rows have changed on the
+   <quote>master</quote> system and therefore need to be copied over to
+   <quote>slave</quote> systems.</para>
+
+   <para> <command>
+     CREATE TRIGGER "_rserv_trigger_t_" AFTER INSERT OR DELETE OR UPDATE ON "my_table"
+        FOR EACH ROW EXECUTE PROCEDURE "_rserv_log_" ('10');
+   </command></para>
+   </listitem>
+
+   </itemizedlist>
+
+   <para> Notice that there are strong parallels between what can be
+   accomplished using triggers and stored procedures, particularly in
+   regards to data validation.  </para>
+
+  </sect1>

   <sect1 id="tutorial-conclusion">
    <title>Conclusion</title>
Index: datetime.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/datetime.sgml,v
retrieving revision 2.39
diff -c -u -r2.39 datetime.sgml
--- datetime.sgml    1 Dec 2003 20:34:53 -0000    2.39
+++ datetime.sgml    30 Dec 2003 01:58:25 -0000
@@ -11,8 +11,8 @@
    strings, and are broken up into distinct fields with a preliminary
    determination of what kind of information may be in the
    field. Each field is interpreted and either assigned a numeric
-   value, ignored, or rejected.
-   The parser contains internal lookup tables for all textual fields,
+   value, ignored, or rejected.</para>
+   <para> The parser contains internal lookup tables for all textual fields,
    including months, days of the week, and time
    zones.
   </para>
@@ -1056,21 +1056,21 @@
    years.
   </para>

-  <para>
-   The papal bull of February 1582 decreed that 10 days should be dropped
-   from October 1582 so that 15 October should follow immediately after
-   4 October.
-   This was observed in Italy, Poland, Portugal, and Spain. Other Catholic
-   countries followed shortly after, but Protestant countries were
-   reluctant to change, and the Greek orthodox countries didn't change
-   until the start of the 20th century.
-
-   The reform was observed by Great Britain and Dominions (including what is
-   now the USA) in 1752.
-   Thus 2 September 1752 was followed by 14 September 1752.
+  <para> The papal bull of February 1582 decreed that 10 days should
+  be dropped from October 1582 so that 15 October should follow
+  immediately after 4 October.</para>
+
+  <para> This was observed in Italy, Poland, Portugal, and Spain.
+  Other Catholic countries followed shortly after, but Protestant
+  countries were reluctant to change, and the Greek orthodox countries
+  didn't change until the start of the 20th century.</para>
+
+  <para> The reform was observed by Great Britain and Dominions
+  (including what is now the USA) in 1752.  Thus 2 September 1752 was
+  followed by 14 September 1752.</para>

-   This is why Unix systems have the <command>cal</command> program
-   produce the following:
+   <para> This is why Unix systems have the <command>cal</command>
+   program produce the following:

 <screen>
 $ <userinput>cal 9 1752</userinput>
@@ -1094,19 +1094,24 @@
     </para>
    </note>

-  <para>
-   Different calendars have been developed in various parts of the
-   world, many predating the Gregorian system.
+  <para> Different calendars have been developed in various parts of
+  the world, many predating the Gregorian system.</para>

-   For example,
-   the beginnings of the Chinese calendar can be traced back to the 14th
-   century BC. Legend has it that the Emperor Huangdi invented the
-   calendar in 2637 BC.
+  <para> For example, the beginnings of the Chinese calendar can be
+  traced back to the 14th century BC. Legend has it that the Emperor
+  Huangdi invented the calendar in 2637 BC.</para>

-   The People's Republic of China uses the Gregorian calendar
-   for civil purposes. The Chinese calendar is used for determining
-   festivals.
+  <para> The People's Republic of China uses the Gregorian calendar
+  for civil purposes. The Chinese calendar is used for determining
+  festivals.
   </para>
+
+  <para> If you are interested in this sort of thing, <citation>
+  Calendrical Calculations: The Millennium Edition </citation> by by
+  Edward M. Reingold and Nachum Dershowitz is an excellent reference,
+  describing some 25 calendars, and providing software for displaying
+  them and converting between them.</para>
+
  </sect1>
 </appendix>
--
let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)

Re: Some Documentation Changes

From
Bruce Momjian
Date:
Patch applied.  Thanks.

---------------------------------------------------------------------------


Christopher Browne wrote:
> 1.  In keeping with the recent discussion that there should be more
> said about views, stored procedures, and triggers, in the tutorial, I
> have added a bit of verbiage to that end.
>
> 2.  Some formatting changes to the datetime discussion, as well as
> addition of a citation of a relevant book on calendars.
>
> Index: advanced.sgml
> ===================================================================
> RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/advanced.sgml,v
> retrieving revision 1.38
> diff -c -u -r1.38 advanced.sgml
> --- advanced.sgml    29 Nov 2003 19:51:36 -0000    1.38
> +++ advanced.sgml    30 Dec 2003 01:58:24 -0000
> @@ -65,10 +65,24 @@
>
>     <para>
>      Views can be used in almost any place a real table can be used.
> -    Building views upon other views is not uncommon.
> +    Building views upon other views is not uncommon.  You may cut down
> +    on the difficulty of building complex queries by constructing them
> +    in smaller, easier-to-verify pieces, using views.  Views may be
> +    used to reveal specific table columns to users that legitimately
> +    need access to some of the data, but who shouldn't be able to look
> +    at the whole table.
>     </para>
> -  </sect1>
>
> +   <para>
> +    Views differ from <quote> real tables </quote> in that they are
> +    not, by default, updatable.  If they join together several tables,
> +    it may be troublesome to update certain columns since the
> +    <emphasis>real</emphasis> update that must take place requires
> +    identifying the relevant rows in the source tables.  This is
> +    discussed further in <xref linkend="rules-views-update">.
> +    </para>
> +
> +  </sect1>
>
>    <sect1 id="tutorial-fk">
>     <title>Foreign Keys</title>
> @@ -387,6 +401,169 @@
>     </para>
>    </sect1>
>
> +  <sect1 id="tutorial-storedprocs">
> +   <title> Stored Procedures </title>
> +
> +   <indexterm zone="tutorial-storedprocs">
> +     <primary>stored procedures</primary>
> +   </indexterm>
> +
> +   <para> Stored procedures are code that runs inside the database
> +   system.  Numerous languages may be used to implement functions and
> +   procedures; most built-in code is implemented in C.  The
> +   <quote>basic</quote> loadable procedural language for
> +   <productname>PostgreSQL</productname> is <xref linkid="plpgsql">.
> +   Numerous other languages may also be used, including <xref
> +   linkid="plperl">, <xref linkid="pltcl">, and <xref
> +   linkid="plpython">.
> +   </para>
> +
> +   <para> There are several ways that stored procedures are really
> +   helpful:
> +
> +   <itemizedlist>
> +
> +   <listitem><para> To centralize data validation code into the
> +   database </para>
> +
> +   <para> Your system may use client software written in several
> +   languages, perhaps with a <quote>web application</quote>
> +   implemented in PHP, a <quote>server application</quote> implemented
> +   in Java, and a <quote> report writer</quote> implemented in Perl.
> +   In the absence of stored procedures, you will likely find that data
> +   validation code must be implemented multiple times, in multiple
> +   languages, once for each application.</para>
> +
> +   <para> By implementing data validation in stored procedures,
> +   running in the database, it can behave uniformly for all these
> +   systems, and you do not need to worry about synchronizing
> +   validation procedures across the languages.</para>
> +
> +   </listitem>
> +
> +   <listitem><para> Reducing round trips between client and server
> +   </para>
> +
> +   <para>A stored procedure may submit multiple queries, looking up
> +   information and adding in links to additional tables.  This takes
> +   place without requiring that the client submit multiple queries,
> +   and without requiring any added network traffic.
> +   </para>
> +
> +   <para> As a matter of course, the queries share a single
> +   transaction context, and there may also be savings in the
> +   evaluation of query plans, that will be similar between invocations
> +   of a given stored procedure.  </para></listitem>
> +
> +   <listitem><para> To simplify queries. </para>
> +
> +   <para> For instance, if you are commonly checking the TLD on domain
> +   names, you might create a stored procedure for this purpose, and so
> +   be able to use queries such as <command> select domain, tld(domain)
> +   from domains; </command> instead of having to put verbose code
> +   using <function>substr()</function> into each query.
> +   </para>
> +
> +   <para> It is particularly convenient to use scripting languages
> +   like Perl, Tcl, and Python to <quote>grovel through strings</quote>
> +   since they are designed for <quote>text processing.</quote></para>
> +
> +   <para> The binding to the R statistical language allows
> +   implementing complex statistical queries inside the database,
> +   instead of having to draw the data out.
> +   </listitem>
> +
> +   <listitem><para> Increasing the level of abstraction</para>
> +
> +   <para> If data is accessed exclusively through stored procedures,
> +   then the structures of tables may be changed without there needing
> +   to be any visible change in the API used by programmers.  In some
> +   systems, users are <emphasis>only</emphasis> allowed access to
> +   stored procedures to update data, and cannot do direct updates to
> +   tables.
> +   </para>
> +
> +   </listitem>
> +
> +   </itemizedlist>
> +   </para>
> +
> +   <para> These benefits build on one another: careful use of stored
> +   procedures can simultaneously improve reliability and performance,
> +   whilst simplifying database access code and improving portability
> +   across client platforms and languages.  For instance, consider that
> +   a stored procedure can cheaply query tables in the database to
> +   validate the correctness of data provided as input. </para>
> +
> +   <para> Instead of requiring a whole series of queries to create an
> +   object, and to look up parent/subsidiary objects to link it to, a
> +   stored procedure can do all of this efficiently in the database
> +   server, improving performance, and eliminating whole classes of
> +   errors. </para>
> +
> +  </sect1>
> +
> +  <sect1 id="tutorial-triggers">
> +   <title> Triggers </title>
> +
> +   <indexterm zone="tutorial-triggers">
> +     <primary>triggers</primary>
> +   </indexterm>
> +
> +   <para> Triggers allow running a function either before or after
> +   update (<command>INSERT</command>, <command>DELETE</command>,
> +   <command>UPDATE</command>) operations, which can allow you to do
> +   some very clever things. </para>
> +
> +   <itemizedlist>
> +
> +   <listitem><para> Data Validation  </para>
> +
> +   <para> Instead of explicitly coding validation checks as part of a
> +   stored procedure, they may be introduced as <command>BEFORE</command>
> +   triggers.  The trigger function checks the input values, raising an
> +   exception if it finds invalid input.</para>
> +
> +   <para> Note that this is how foreign key checks are implemented in
> +   <productname>PostgreSQL</productname>; when you define a foreign
> +   key, you will see a message similar to the following:
> +<screen>
> +NOTICE:  CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s)
> +</screen></para>
> +
> +   <para> In some cases, it may be appropriate for a trigger function
> +   to insert data in order to <emphasis>make</emphasis> the input valid.  For
> +   instance, if a newly created object needs a status code in a status
> +   table, the trigger might automatically do that.</para>
> +   </listitem>
> +
> +   <listitem><para> Audit logs </para>
> +
> +   <para> One may use <command>AFTER</command> triggers to monitor updates to
> +   vital tables, and <command>INSERT</command> entries into log tables to
> +   provide a more permanent record of those updates.  </para>
> +   </listitem>
> +
> +   <listitem><para> Replication </para>
> +
> +   <para> The <application>RServ</application> replication system uses
> +   <command>AFTER</command> triggers to track which rows have changed on the
> +   <quote>master</quote> system and therefore need to be copied over to
> +   <quote>slave</quote> systems.</para>
> +
> +   <para> <command>
> +     CREATE TRIGGER "_rserv_trigger_t_" AFTER INSERT OR DELETE OR UPDATE ON "my_table"
> +        FOR EACH ROW EXECUTE PROCEDURE "_rserv_log_" ('10');
> +   </command></para>
> +   </listitem>
> +
> +   </itemizedlist>
> +
> +   <para> Notice that there are strong parallels between what can be
> +   accomplished using triggers and stored procedures, particularly in
> +   regards to data validation.  </para>
> +
> +  </sect1>
>
>    <sect1 id="tutorial-conclusion">
>     <title>Conclusion</title>
> Index: datetime.sgml
> ===================================================================
> RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/datetime.sgml,v
> retrieving revision 2.39
> diff -c -u -r2.39 datetime.sgml
> --- datetime.sgml    1 Dec 2003 20:34:53 -0000    2.39
> +++ datetime.sgml    30 Dec 2003 01:58:25 -0000
> @@ -11,8 +11,8 @@
>     strings, and are broken up into distinct fields with a preliminary
>     determination of what kind of information may be in the
>     field. Each field is interpreted and either assigned a numeric
> -   value, ignored, or rejected.
> -   The parser contains internal lookup tables for all textual fields,
> +   value, ignored, or rejected.</para>
> +   <para> The parser contains internal lookup tables for all textual fields,
>     including months, days of the week, and time
>     zones.
>    </para>
> @@ -1056,21 +1056,21 @@
>     years.
>    </para>
>
> -  <para>
> -   The papal bull of February 1582 decreed that 10 days should be dropped
> -   from October 1582 so that 15 October should follow immediately after
> -   4 October.
> -   This was observed in Italy, Poland, Portugal, and Spain. Other Catholic
> -   countries followed shortly after, but Protestant countries were
> -   reluctant to change, and the Greek orthodox countries didn't change
> -   until the start of the 20th century.
> -
> -   The reform was observed by Great Britain and Dominions (including what is
> -   now the USA) in 1752.
> -   Thus 2 September 1752 was followed by 14 September 1752.
> +  <para> The papal bull of February 1582 decreed that 10 days should
> +  be dropped from October 1582 so that 15 October should follow
> +  immediately after 4 October.</para>
> +
> +  <para> This was observed in Italy, Poland, Portugal, and Spain.
> +  Other Catholic countries followed shortly after, but Protestant
> +  countries were reluctant to change, and the Greek orthodox countries
> +  didn't change until the start of the 20th century.</para>
> +
> +  <para> The reform was observed by Great Britain and Dominions
> +  (including what is now the USA) in 1752.  Thus 2 September 1752 was
> +  followed by 14 September 1752.</para>
>
> -   This is why Unix systems have the <command>cal</command> program
> -   produce the following:
> +   <para> This is why Unix systems have the <command>cal</command>
> +   program produce the following:
>
>  <screen>
>  $ <userinput>cal 9 1752</userinput>
> @@ -1094,19 +1094,24 @@
>      </para>
>     </note>
>
> -  <para>
> -   Different calendars have been developed in various parts of the
> -   world, many predating the Gregorian system.
> +  <para> Different calendars have been developed in various parts of
> +  the world, many predating the Gregorian system.</para>
>
> -   For example,
> -   the beginnings of the Chinese calendar can be traced back to the 14th
> -   century BC. Legend has it that the Emperor Huangdi invented the
> -   calendar in 2637 BC.
> +  <para> For example, the beginnings of the Chinese calendar can be
> +  traced back to the 14th century BC. Legend has it that the Emperor
> +  Huangdi invented the calendar in 2637 BC.</para>
>
> -   The People's Republic of China uses the Gregorian calendar
> -   for civil purposes. The Chinese calendar is used for determining
> -   festivals.
> +  <para> The People's Republic of China uses the Gregorian calendar
> +  for civil purposes. The Chinese calendar is used for determining
> +  festivals.
>    </para>
> +
> +  <para> If you are interested in this sort of thing, <citation>
> +  Calendrical Calculations: The Millennium Edition </citation> by by
> +  Edward M. Reingold and Nachum Dershowitz is an excellent reference,
> +  describing some 25 calendars, and providing software for displaying
> +  them and converting between them.</para>
> +
>   </sect1>
>  </appendix>
> --
> let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
> <http://dev6.int.libertyrms.com/>
> Christopher Browne
> (416) 646 3304 x124 (land)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Some Documentation Changes

From
Peter Eisentraut
Date:
Am Dienstag, 30. Dezember 2003 03:01 schrieb Christopher Browne:
> 1.  In keeping with the recent discussion that there should be more
> said about views, stored procedures, and triggers, in the tutorial, I
> have added a bit of verbiage to that end.

The idea that seems to get lost here is that the Tutorial is supposed to be
something for people to try out, not just a list of interesting ideas to keep
in mind for later on.

> 2.  Some formatting changes to the datetime discussion,

Please revert them.

> as well as
> addition of a citation of a relevant book on calendars.

Citations go into the bibliography.


Re: Some Documentation Changes

From
Peter Eisentraut
Date:
Am Dienstag, 30. März 2004 23:58 schrieb Bruce Momjian:
> Patch applied.  Thanks.

Please revert this patch.  The material is not "Tutorial" material.


Re: Some Documentation Changes

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Am Dienstag, 30. Dezember 2003 03:01 schrieb Christopher Browne:
> > 1.  In keeping with the recent discussion that there should be more
> > said about views, stored procedures, and triggers, in the tutorial, I
> > have added a bit of verbiage to that end.
>
> The idea that seems to get lost here is that the Tutorial is supposed to be
> something for people to try out, not just a list of interesting ideas to keep
> in mind for later on.
>
> > 2.  Some formatting changes to the datetime discussion,
>
> Please revert them.
>
> > as well as
> > addition of a citation of a relevant book on calendars.
>
> Citations go into the bibliography.

OK, entire patch reverted.  Does someone want to rework this information
to fit into our docs more cleanly?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/advanced.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/advanced.sgml,v
retrieving revision 1.38
retrieving revision 1.40
diff -c -c -r1.38 -r1.40
*** doc/src/sgml/advanced.sgml    29 Nov 2003 19:51:36 -0000    1.38
--- doc/src/sgml/advanced.sgml    30 Mar 2004 22:08:50 -0000    1.40
***************
*** 1,5 ****
  <!--
! $PostgreSQL: pgsql-server/doc/src/sgml/advanced.sgml,v 1.38 2003/11/29 19:51:36 pgsql Exp $
  -->

   <chapter id="tutorial-advanced">
--- 1,5 ----
  <!--
! $PostgreSQL: pgsql-server/doc/src/sgml/advanced.sgml,v 1.40 2004/03/30 22:08:50 momjian Exp $
  -->

   <chapter id="tutorial-advanced">
***************
*** 65,74 ****

     <para>
      Views can be used in almost any place a real table can be used.
!     Building views upon other views is not uncommon.
     </para>
-   </sect1>


    <sect1 id="tutorial-fk">
     <title>Foreign Keys</title>
--- 65,88 ----

     <para>
      Views can be used in almost any place a real table can be used.
!     Building views upon other views is not uncommon.  You may cut down
!     on the difficulty of building complex queries by constructing them
!     in smaller, easier-to-verify pieces, using views.  Views may be
!     used to reveal specific table columns to users that legitimately
!     need access to some of the data, but who shouldn't be able to look
!     at the whole table.
     </para>

+    <para>
+     Views differ from <quote> real tables </quote> in that they are
+     not, by default, updatable.  If they join together several tables,
+     it may be troublesome to update certain columns since the
+     <emphasis>real</emphasis> update that must take place requires
+     identifying the relevant rows in the source tables.  This is
+     discussed further in <xref linkend="rules-views-update">.
+     </para>
+
+   </sect1>

    <sect1 id="tutorial-fk">
     <title>Foreign Keys</title>
***************
*** 387,392 ****
--- 401,569 ----
     </para>
    </sect1>

+   <sect1 id="tutorial-storedprocs">
+    <title> Stored Procedures </title>
+
+    <indexterm zone="tutorial-storedprocs">
+      <primary>stored procedures</primary>
+    </indexterm>
+
+    <para> Stored procedures are code that runs inside the database
+    system.  Numerous languages may be used to implement functions and
+    procedures; most built-in code is implemented in C.  The
+    <quote>basic</quote> loadable procedural language for
+    <productname>PostgreSQL</productname> is <xref linkend="plpgsql">.
+    Numerous other languages may also be used, including <xref
+    linkend="plperl">, <xref linkend="pltcl">, and <xref
+    linkend="plpython">.
+    </para>
+
+    <para> There are several ways that stored procedures are really
+    helpful:
+
+    <itemizedlist>
+
+    <listitem><para> To centralize data validation code into the
+    database </para>
+
+    <para> Your system may use client software written in several
+    languages, perhaps with a <quote>web application</quote>
+    implemented in PHP, a <quote>server application</quote> implemented
+    in Java, and a <quote> report writer</quote> implemented in Perl.
+    In the absence of stored procedures, you will likely find that data
+    validation code must be implemented multiple times, in multiple
+    languages, once for each application.</para>
+
+    <para> By implementing data validation in stored procedures,
+    running in the database, it can behave uniformly for all these
+    systems, and you do not need to worry about synchronizing
+    validation procedures across the languages.</para>
+
+    </listitem>
+
+    <listitem><para> Reducing round trips between client and server
+    </para>
+
+    <para>A stored procedure may submit multiple queries, looking up
+    information and adding in links to additional tables.  This takes
+    place without requiring that the client submit multiple queries,
+    and without requiring any added network traffic.
+    </para>
+
+    <para> As a matter of course, the queries share a single
+    transaction context, and there may also be savings in the
+    evaluation of query plans, that will be similar between invocations
+    of a given stored procedure.  </para></listitem>
+
+    <listitem><para> To simplify queries. </para>
+
+    <para> For instance, if you are commonly checking the TLD on domain
+    names, you might create a stored procedure for this purpose, and so
+    be able to use queries such as <command> select domain, tld(domain)
+    from domains; </command> instead of having to put verbose code
+    using <function>substr()</function> into each query.
+    </para>
+
+    <para> It is particularly convenient to use scripting languages
+    like Perl, Tcl, and Python to <quote>grovel through strings</quote>
+    since they are designed for <quote>text processing.</quote></para>
+
+    <para> The binding to the R statistical language allows
+    implementing complex statistical queries inside the database,
+    instead of having to draw the data out.
+    </listitem>
+
+    <listitem><para> Increasing the level of abstraction</para>
+
+    <para> If data is accessed exclusively through stored procedures,
+    then the structures of tables may be changed without there needing
+    to be any visible change in the API used by programmers.  In some
+    systems, users are <emphasis>only</emphasis> allowed access to
+    stored procedures to update data, and cannot do direct updates to
+    tables.
+    </para>
+
+    </listitem>
+
+    </itemizedlist>
+    </para>
+
+    <para> These benefits build on one another: careful use of stored
+    procedures can simultaneously improve reliability and performance,
+    whilst simplifying database access code and improving portability
+    across client platforms and languages.  For instance, consider that
+    a stored procedure can cheaply query tables in the database to
+    validate the correctness of data provided as input. </para>
+
+    <para> Instead of requiring a whole series of queries to create an
+    object, and to look up parent/subsidiary objects to link it to, a
+    stored procedure can do all of this efficiently in the database
+    server, improving performance, and eliminating whole classes of
+    errors. </para>
+
+   </sect1>
+
+   <sect1 id="tutorial-triggers">
+    <title> Triggers </title>
+
+    <indexterm zone="tutorial-triggers">
+      <primary>triggers</primary>
+    </indexterm>
+
+    <para> Triggers allow running a function either before or after
+    update (<command>INSERT</command>, <command>DELETE</command>,
+    <command>UPDATE</command>) operations, which can allow you to do
+    some very clever things. </para>
+
+    <itemizedlist>
+
+    <listitem><para> Data Validation  </para>
+
+    <para> Instead of explicitly coding validation checks as part of a
+    stored procedure, they may be introduced as <command>BEFORE</command>
+    triggers.  The trigger function checks the input values, raising an
+    exception if it finds invalid input.</para>
+
+    <para> Note that this is how foreign key checks are implemented in
+    <productname>PostgreSQL</productname>; when you define a foreign
+    key, you will see a message similar to the following:
+ <screen>
+ NOTICE:  CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s)
+ </screen></para>
+
+    <para> In some cases, it may be appropriate for a trigger function
+    to insert data in order to <emphasis>make</emphasis> the input valid.  For
+    instance, if a newly created object needs a status code in a status
+    table, the trigger might automatically do that.</para>
+    </listitem>
+
+    <listitem><para> Audit logs </para>
+
+    <para> One may use <command>AFTER</command> triggers to monitor updates to
+    vital tables, and <command>INSERT</command> entries into log tables to
+    provide a more permanent record of those updates.  </para>
+    </listitem>
+
+    <listitem><para> Replication </para>
+
+    <para> The <application>RServ</application> replication system uses
+    <command>AFTER</command> triggers to track which rows have changed on the
+    <quote>master</quote> system and therefore need to be copied over to
+    <quote>slave</quote> systems.</para>
+
+    <para> <command>
+      CREATE TRIGGER "_rserv_trigger_t_" AFTER INSERT OR DELETE OR UPDATE ON "my_table"
+         FOR EACH ROW EXECUTE PROCEDURE "_rserv_log_" ('10');
+    </command></para>
+    </listitem>
+
+    </itemizedlist>
+
+    <para> Notice that there are strong parallels between what can be
+    accomplished using triggers and stored procedures, particularly in
+    regards to data validation.  </para>
+
+   </sect1>

    <sect1 id="tutorial-conclusion">
     <title>Conclusion</title>
Index: doc/src/sgml/datetime.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/datetime.sgml,v
retrieving revision 2.39
retrieving revision 2.40
diff -c -c -r2.39 -r2.40
*** doc/src/sgml/datetime.sgml    1 Dec 2003 20:34:53 -0000    2.39
--- doc/src/sgml/datetime.sgml    30 Mar 2004 21:58:20 -0000    2.40
***************
*** 1,5 ****
  <!--
! $PostgreSQL: pgsql-server/doc/src/sgml/datetime.sgml,v 2.39 2003/12/01 20:34:53 tgl Exp $
  -->

   <appendix id="datetime-appendix">
--- 1,5 ----
  <!--
! $PostgreSQL: pgsql-server/doc/src/sgml/datetime.sgml,v 2.40 2004/03/30 21:58:20 momjian Exp $
  -->

   <appendix id="datetime-appendix">
***************
*** 11,18 ****
     strings, and are broken up into distinct fields with a preliminary
     determination of what kind of information may be in the
     field. Each field is interpreted and either assigned a numeric
!    value, ignored, or rejected.
!    The parser contains internal lookup tables for all textual fields,
     including months, days of the week, and time
     zones.
    </para>
--- 11,18 ----
     strings, and are broken up into distinct fields with a preliminary
     determination of what kind of information may be in the
     field. Each field is interpreted and either assigned a numeric
!    value, ignored, or rejected.</para>
!    <para> The parser contains internal lookup tables for all textual fields,
     including months, days of the week, and time
     zones.
    </para>
***************
*** 1056,1076 ****
     years.
    </para>

!   <para>
!    The papal bull of February 1582 decreed that 10 days should be dropped
!    from October 1582 so that 15 October should follow immediately after
!    4 October.
!    This was observed in Italy, Poland, Portugal, and Spain. Other Catholic
!    countries followed shortly after, but Protestant countries were
!    reluctant to change, and the Greek orthodox countries didn't change
!    until the start of the 20th century.
!
!    The reform was observed by Great Britain and Dominions (including what is
!    now the USA) in 1752.
!    Thus 2 September 1752 was followed by 14 September 1752.

!    This is why Unix systems have the <command>cal</command> program
!    produce the following:

  <screen>
  $ <userinput>cal 9 1752</userinput>
--- 1056,1076 ----
     years.
    </para>

!   <para> The papal bull of February 1582 decreed that 10 days should
!   be dropped from October 1582 so that 15 October should follow
!   immediately after 4 October.</para>
!
!   <para> This was observed in Italy, Poland, Portugal, and Spain.
!   Other Catholic countries followed shortly after, but Protestant
!   countries were reluctant to change, and the Greek orthodox countries
!   didn't change until the start of the 20th century.</para>
!
!   <para> The reform was observed by Great Britain and Dominions
!   (including what is now the USA) in 1752.  Thus 2 September 1752 was
!   followed by 14 September 1752.</para>

!    <para> This is why Unix systems have the <command>cal</command>
!    program produce the following:

  <screen>
  $ <userinput>cal 9 1752</userinput>
***************
*** 1094,1112 ****
      </para>
     </note>

!   <para>
!    Different calendars have been developed in various parts of the
!    world, many predating the Gregorian system.

!    For example,
!    the beginnings of the Chinese calendar can be traced back to the 14th
!    century BC. Legend has it that the Emperor Huangdi invented the
!    calendar in 2637 BC.

!    The People's Republic of China uses the Gregorian calendar
!    for civil purposes. The Chinese calendar is used for determining
!    festivals.
    </para>
   </sect1>
  </appendix>

--- 1094,1117 ----
      </para>
     </note>

!   <para> Different calendars have been developed in various parts of
!   the world, many predating the Gregorian system.</para>

!   <para> For example, the beginnings of the Chinese calendar can be
!   traced back to the 14th century BC. Legend has it that the Emperor
!   Huangdi invented the calendar in 2637 BC.</para>

!   <para> The People's Republic of China uses the Gregorian calendar
!   for civil purposes. The Chinese calendar is used for determining
!   festivals.
    </para>
+
+   <para> If you are interested in this sort of thing, <citation>
+   Calendrical Calculations: The Millennium Edition </citation> by by
+   Edward M. Reingold and Nachum Dershowitz is an excellent reference,
+   describing some 25 calendars, and providing software for displaying
+   them and converting between them.</para>
+
   </sect1>
  </appendix>