Thread: remove BufferBlockPointers for speed and space

remove BufferBlockPointers for speed and space

From
"Qingqing Zhou"
Date:
It is said that the BufferBlockPointers is used to speedup the
BufferGetBlock() macro. I compared three ways of getting block pointers.
I.e., existing method (arrary method), calculating block pointer by adding
base addr and offset*blockid method (mul method) and optimizing mul method
by using bit shift (shift method). All of them calculate the block pointer
80000 times (i.e., the BufferBlockPointers array is of size 80000), and each
take 3 rounds.

The result is:

SunOS/gcc 3.2
duration round 1 of array method: 4.179 ms
duration round 2 of array method: 4.160 ms
duration round 3 of array method: 4.143 ms
duration round 1 of mul method: 3.311 ms
duration round 2 of mul method: 3.233 ms
duration round 3 of mul method: 3.233 ms
duration round 1 of shift method: 3.554 ms
duration round 2 of shift method: 3.235 ms
duration round 3 of shift method: 3.233 ms

Linux/gcc 3.2
duration round 1 of array method: 0.422 ms
duration round 2 of array method: 0.324 ms
duration round 3 of array method: 0.354 ms
duration round 1 of mul method: 0.271 ms
duration round 2 of mul method: 0.248 ms
duration round 3 of mul method: 0.304 ms
duration round 1 of shift method: 0.322 ms
duration round 2 of shift method: 0.239 ms
duration round 3 of shift method: 0.265 ms

We can conclude that:
(1) mul or shift are definitely better than array method;
(2) mul and shift are comparable;

So we can remove BufferBlockPointers for speed and space.

Regards,
Qingqing

--------

Updated files:
/include/storage/bufmgr.h
/storage/buffer/buf_init.c
/storage/buffer/bufmgr.c

Index: bufmgr.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/storage/bufmgr.h,v
retrieving revision 1.94
diff -c -r1.94 bufmgr.h
*** bufmgr.h    8 Aug 2005 03:12:16 -0000       1.94
--- bufmgr.h    11 Aug 2005 05:15:04 -0000
***************
*** 33,39 ****
  extern int    bgwriter_all_maxpages;

  /* in buf_init.c */
! extern DLLIMPORT Block *BufferBlockPointers;
  extern DLLIMPORT int32 *PrivateRefCount;

  /* in localbuf.c */
--- 33,39 ----
  extern int    bgwriter_all_maxpages;

  /* in buf_init.c */
! extern DLLIMPORT char  *BufferBlocks;
  extern DLLIMPORT int32 *PrivateRefCount;

  /* in localbuf.c */
***************
*** 107,113 ****
        BufferIsLocal(buffer) ? \
                LocalBufferBlockPointers[-(buffer) - 1] \
        : \
!               BufferBlockPointers[(buffer) - 1] \
  )

  /*
--- 107,113 ----
        BufferIsLocal(buffer) ? \
                LocalBufferBlockPointers[-(buffer) - 1] \
        : \
!               BufferBlocks + ((buffer) - 1)*BLCKSZ \
  )

  /*

Index: buf_init.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/buf_init.c,v
retrieving revision 1.74
diff -c -r1.74 buf_init.c
*** buf_init.c  8 Aug 2005 03:11:44 -0000       1.74
--- buf_init.c  11 Aug 2005 05:17:38 -0000
***************
*** 19,28 ****


  BufferDesc *BufferDescriptors;
- Block    *BufferBlockPointers;
  int32    *PrivateRefCount;
!
! static char *BufferBlocks;

  /* statistics counters */
  long int      ReadBufferCount;
--- 19,26 ----


  BufferDesc *BufferDescriptors;
  int32    *PrivateRefCount;
! char     *BufferBlocks;

  /* statistics counters */
  long int      ReadBufferCount;
***************
*** 154,183 ****
  void
  InitBufferPoolAccess(void)
  {
-       char       *block;
-       int                     i;
-
        /*
         * Allocate and zero local arrays of per-buffer info.
         */
-       BufferBlockPointers = (Block *) calloc(NBuffers,
-
sizeof(*BufferBlockPointers));
        PrivateRefCount = (int32 *) calloc(NBuffers,

sizeof(*PrivateRefCount));
-
-       /*
-        * Construct addresses for the individual buffer data blocks.  We do
-        * this just to speed up the BufferGetBlock() macro.  (Since the
-        * addresses should be the same in every backend, we could inherit
-        * this data from the postmaster --- but in the EXEC_BACKEND case
-        * that doesn't work.)
-        */
-       block = BufferBlocks;
-       for (i = 0; i < NBuffers; i++)
-       {
-               BufferBlockPointers[i] = (Block) block;
-               block += BLCKSZ;
-       }
  }

  /*
--- 152,162 ----

Index: bufmgr.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.192
diff -c -r1.192 bufmgr.c
*** bufmgr.c    8 Aug 2005 19:44:22 -0000       1.192
--- bufmgr.c    11 Aug 2005 05:18:35 -0000
***************
*** 54,60 ****


  /* Note: these two macros only work on shared buffers, not local ones! */
! #define BufHdrGetBlock(bufHdr)
BufferBlockPointers[(bufHdr)->buf_id]
  #define BufferGetLSN(bufHdr)  (*((XLogRecPtr*) BufHdrGetBlock(bufHdr)))

  /* Note: this macro only works on local buffers, not shared ones! */
--- 54,60 ----


  /* Note: these two macros only work on shared buffers, not local ones! */
! #define BufHdrGetBlock(bufHdr)        (BufferBlocks +
((bufHdr)->buf_id)*BLCKSZ)
  #define BufferGetLSN(bufHdr)  (*((XLogRecPtr*) BufHdrGetBlock(bufHdr)))

  /* Note: this macro only works on local buffers, not shared ones! */




Re: remove BufferBlockPointers for speed and space

From
Gavin Sherry
Date:
On Thu, 11 Aug 2005, Qingqing Zhou wrote:

> It is said that the BufferBlockPointers is used to speedup the
> BufferGetBlock() macro. I compared three ways of getting block pointers.
> I.e., existing method (arrary method), calculating block pointer by adding
> base addr and offset*blockid method (mul method) and optimizing mul method
> by using bit shift (shift method). All of them calculate the block pointer
> 80000 times (i.e., the BufferBlockPointers array is of size 80000), and each
> take 3 rounds.
>
> The result is:
>
> SunOS/gcc 3.2
> duration round 1 of array method: 4.179 ms
> duration round 2 of array method: 4.160 ms
> duration round 3 of array method: 4.143 ms
> duration round 1 of mul method: 3.311 ms
> duration round 2 of mul method: 3.233 ms
> duration round 3 of mul method: 3.233 ms
> duration round 1 of shift method: 3.554 ms
> duration round 2 of shift method: 3.235 ms
> duration round 3 of shift method: 3.233 ms
>
> Linux/gcc 3.2
> duration round 1 of array method: 0.422 ms
> duration round 2 of array method: 0.324 ms
> duration round 3 of array method: 0.354 ms
> duration round 1 of mul method: 0.271 ms
> duration round 2 of mul method: 0.248 ms
> duration round 3 of mul method: 0.304 ms
> duration round 1 of shift method: 0.322 ms
> duration round 2 of shift method: 0.239 ms
> duration round 3 of shift method: 0.265 ms
>
> We can conclude that:
> (1) mul or shift are definitely better than array method;
> (2) mul and shift are comparable;

Do you have results for more recent gcc releases?

Thanks,

Gavin

Re: remove BufferBlockPointers for speed and space

From
Tom Lane
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:
> On Thu, 11 Aug 2005, Qingqing Zhou wrote:
>> It is said that the BufferBlockPointers is used to speedup the
>> BufferGetBlock() macro. I compared three ways of getting block pointers.

> Do you have results for more recent gcc releases?

Or more than one hardware architecture (which you didn't even say what
you tested...)

Also, I would like to see the actual test code.  I wonder whether what
you measured is the ability of the compiler to optimize references to
successive elements of an array inside a loop; that has little or
nothing to do with the typical usage of BufferGetBlock().

            regards, tom lane

Re: remove BufferBlockPointers for speed and space

From
Gavin Sherry
Date:
On Thu, 11 Aug 2005, Tom Lane wrote:

> Gavin Sherry <swm@linuxworld.com.au> writes:
> > On Thu, 11 Aug 2005, Qingqing Zhou wrote:
> >> It is said that the BufferBlockPointers is used to speedup the
> >> BufferGetBlock() macro. I compared three ways of getting block pointers.
>
> > Do you have results for more recent gcc releases?
>
> Or more than one hardware architecture (which you didn't even say what
> you tested...)

Well, he tested on SunOS (!) and Linux -- I presume that's two
architectures. But yeah, that definately does mean that the results hold
true for our user base.

If we had the code (and if it does measure the right thing), then we can
easily test on a lot more architectures.

Gavin

Re: remove BufferBlockPointers for speed and space

From
"Qingqing Zhou"
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
> Also, I would like to see the actual test code.  I wonder whether what
> you measured is the ability of the compiler to optimize references to
> successive elements of an array inside a loop; that has little or
> nothing to do with the typical usage of BufferGetBlock().
>

The source code is attached.

compiled with "gcc testbuf.c". I tried -O2 actually, and it turns out that
the timing is reduced a lot so not believable.
---

/*
 * testbuf.c
 */
#include <stdio.h>
#include <sys/file.h>
#include <sys/param.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <unistd.h>
#include <fcntl.h>

#define BLCKSZ  8192
#define NBuffers 80000

typedef void* Block;

int main(void)
{
 int  i, round, method;
 Block k, start;
 struct timeval start_t, stop_t;
 long usecs;
 Block *array = (Block *) calloc(NBuffers, sizeof(Block));

 start = (Block)0xff3386;
 for (i = 0; i < NBuffers; i++)
  array[i] = start + BLCKSZ*i;

 for (method = 0; method < 3; method ++)
 {
  start = (Block)0xff3386;
  for (round = 0; round < 3; round ++)
  {
   gettimeofday(&start_t, NULL);
   if (method == 0)
   {
    for (i = 0; i < NBuffers; i++)
     k = array[i];
   }
   if (method == 1)
   {
    for (i = 0; i < NBuffers; i++)
     k = start + i*BLCKSZ;
   }
   if (method == 2)
   {
    for (i = 0; i < NBuffers; i++)
     k = start + (i<<13);
   }
   gettimeofday(&stop_t, NULL);

   if (stop_t.tv_usec < start_t.tv_usec)
   {
     stop_t.tv_sec--;
     stop_t.tv_usec += 1000000;
   }

   usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 1000000
     + (long) (stop_t.tv_usec - start_t.tv_usec);

   fprintf (stdout, "duration round %d of %s method: %ld.%03ld ms\n",
       round + 1,
       method==0?"array":method==1?"mul":"shift",
       (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
         (stop_t.tv_usec - start_t.tv_usec) / 1000),
       (long) (stop_t.tv_usec - start_t.tv_usec) % 1000);
  }
 }
}



Re: remove BufferBlockPointers for speed and space

From
Mark Kirkwood
Date:
FreeBSD 5.4 RELEASE gcc 3.4.2 on Intel dual PIII 1Ghz

[postgres:~/develop/c/testbuf]$ gcc -Wall -o testbuf testbuf.c
[postgres:~/develop/c/testbuf]$ ./testbuf
duration round 1 of array method: 1.737 ms
duration round 2 of array method: 1.676 ms
duration round 3 of array method: 1.527 ms
duration round 1 of mul method: 0.548 ms
duration round 2 of mul method: 0.548 ms
duration round 3 of mul method: 0.546 ms
duration round 1 of shift method: 0.593 ms
duration round 2 of shift method: 0.592 ms
duration round 3 of shift method: 0.575 ms
[postgres:~/develop/c/testbuf]$ gcc -Wall -O2 -o testbuf testbuf.c
[postgres:~/develop/c/testbuf]$ ./testbuf
duration round 1 of array method: 0.169 ms
duration round 2 of array method: 0.165 ms
duration round 3 of array method: 0.165 ms
duration round 1 of mul method: 0.164 ms
duration round 2 of mul method: 0.164 ms
duration round 3 of mul method: 0.165 ms
duration round 1 of shift method: 0.164 ms
duration round 2 of shift method: 0.164 ms
duration round 3 of shift method: 0.165 ms

Looks to me like -O2 makes the difference very small (on this
platform/gcc combo) - is 5/169 worth doing?

BTW - I patched the program to stop gcc whining:

--- testbuf.c.orig      Thu Aug 11 18:41:29 2005
+++ testbuf.c   Thu Aug 11 18:37:43 2005
@@ -2,6 +2,7 @@
   * testbuf.c
   */
  #include <stdio.h>
+#include <stdlib.h>
  #include <sys/file.h>
  #include <sys/param.h>
  #include <sys/stat.h>
@@ -66,6 +67,8 @@
         (long) (stop_t.tv_usec - start_t.tv_usec) % 1000);
    }
   }
+
+ return 0;
  }


Qingqing Zhou wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
>>Also, I would like to see the actual test code.  I wonder whether what
>>you measured is the ability of the compiler to optimize references to
>>successive elements of an array inside a loop; that has little or
>>nothing to do with the typical usage of BufferGetBlock().
>>
>
>
> The source code is attached.
>
> compiled with "gcc testbuf.c". I tried -O2 actually, and it turns out that
> the timing is reduced a lot so not believable.
> ---
>
> /*
>  * testbuf.c
>  */
> #include <stdio.h>
> #include <sys/file.h>
> #include <sys/param.h>
> #include <sys/stat.h>
> #include <sys/time.h>
> #include <unistd.h>
> #include <fcntl.h>
>
> #define BLCKSZ  8192
> #define NBuffers 80000
>
> typedef void* Block;
>
> int main(void)
> {
>  int  i, round, method;
>  Block k, start;
>  struct timeval start_t, stop_t;
>  long usecs;
>  Block *array = (Block *) calloc(NBuffers, sizeof(Block));
>
>  start = (Block)0xff3386;
>  for (i = 0; i < NBuffers; i++)
>   array[i] = start + BLCKSZ*i;
>
>  for (method = 0; method < 3; method ++)
>  {
>   start = (Block)0xff3386;
>   for (round = 0; round < 3; round ++)
>   {
>    gettimeofday(&start_t, NULL);
>    if (method == 0)
>    {
>     for (i = 0; i < NBuffers; i++)
>      k = array[i];
>    }
>    if (method == 1)
>    {
>     for (i = 0; i < NBuffers; i++)
>      k = start + i*BLCKSZ;
>    }
>    if (method == 2)
>    {
>     for (i = 0; i < NBuffers; i++)
>      k = start + (i<<13);
>    }
>    gettimeofday(&stop_t, NULL);
>
>    if (stop_t.tv_usec < start_t.tv_usec)
>    {
>      stop_t.tv_sec--;
>      stop_t.tv_usec += 1000000;
>    }
>
>    usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 1000000
>      + (long) (stop_t.tv_usec - start_t.tv_usec);
>
>    fprintf (stdout, "duration round %d of %s method: %ld.%03ld ms\n",
>        round + 1,
>        method==0?"array":method==1?"mul":"shift",
>        (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
>          (stop_t.tv_usec - start_t.tv_usec) / 1000),
>        (long) (stop_t.tv_usec - start_t.tv_usec) % 1000);
>   }
>  }
> }
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>
>


Re: remove BufferBlockPointers for speed and space

From
Mark Kirkwood
Date:
Mark Kirkwood wrote:

> Looks to me like -O2 makes the difference very small (on this
> platform/gcc combo) - is 5/169 worth doing?

Ahem - misunderstood your comment here, sorry.

> Qingqing Zhou wrote:

>> compiled with "gcc testbuf.c". I tried -O2 actually, and it turns out
>> that
>> the timing is reduced a lot so not believable.

Re: remove BufferBlockPointers for speed and space

From
Andrew Dunstan
Date:

Gavin Sherry wrote:

>>Or more than one hardware architecture (which you didn't even say what
>>you tested...)
>>
>>
>
>Well, he tested on SunOS (!) and Linux -- I presume that's two
>architectures.
>
>

Sun still calls Solaris SunOs - try doing uname -s on a Solaris box (or
look at a buildfarm solaris build info)

So what he said he tested could mean Solaris/x86 + Linux x86 or
Solaris/Sparc + Linux Sparc  among other possible combos. Given the
results I assume they are at least not the same box ;-)

cheers

andrew

Re: remove BufferBlockPointers for speed and space

From
Gavin Sherry
Date:
On Thu, 11 Aug 2005, Andrew Dunstan wrote:

>
>
> Gavin Sherry wrote:
>
> >>Or more than one hardware architecture (which you didn't even say what
> >>you tested...)
> >>
> >>
> >
> >Well, he tested on SunOS (!) and Linux -- I presume that's two
> >architectures.
> >
> >
>
> Sun still calls Solaris SunOs - try doing uname -s on a Solaris box (or
> look at a buildfarm solaris build info)

True. But my previous experience in university environments is that SunOS
usually refers to SunOS 2.6 -- and the performance indicates old hardware.

The thing is, compilser optimised versions of the test reveal very little
difference in performance. This may be because the compiler is very good
at optimising sequential annd predictable access to the array. Instead, we
should mimic what we see in the real world: random access.

Gavin

Re: remove BufferBlockPointers for speed and space

From
Andrew Dunstan
Date:

Gavin Sherry wrote:

>>>
>>>
>>Sun still calls Solaris SunOs - try doing uname -s on a Solaris box (or
>>look at a buildfarm solaris build info)
>>
>>
>
>True. But my previous experience in university environments is that SunOS
>usually refers to SunOS 2.6
>
>

I think I must be a lot older then you. To me SunOs means the old
pre-Solaris BSD-based OS ;-)

cheers

andrew

Re: remove BufferBlockPointers for speed and space

From
Tom Lane
Date:
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> The source code is attached.

>    gettimeofday(&start_t, NULL);
>    if (method == 0)
>    {
>     for (i = 0; i < NBuffers; i++)
>      k = array[i];
>    }
>    if (method == 1)
>    {
>     for (i = 0; i < NBuffers; i++)
>      k = start + i*BLCKSZ;
>    }
>    if (method == 2)
>    {
>     for (i = 0; i < NBuffers; i++)
>      k = start + (i<<13);
>    }
>    gettimeofday(&stop_t, NULL);

This is definitely going to tell us a lot more about the compiler's loop
strength reduction algorithms than it is going to tell about the time to
evaluate any one of these expressions in isolation.  What's worse, the
compiler would be quite within its rights to detect that k isn't used
anywhere, and optimize the loop away completely.

What I would suggest is first to fill another array with some large
number of buffer numbers (randomly chosen from 1..N) and then time loops
of the form

    for (i = 0; i < arraysize; i++)
    {
        bn = buffernumbers[i];
        bufferlocs[i] = BufferGetBlock(bn);
    }

for all three possible definitions of BufferGetBlock (where both arrays
are global variables, just to be sure the compiler doesn't think it can
discard the store).  This should give some numbers worth trusting.

            regards, tom lane

Re: remove BufferBlockPointers for speed and space

From
"Joshua D. Drake"
Date:
Hello,

With gcc-4 on Ubuntu AMD Athlon 2600 Barton:


jd@jd:~$ ./a.out
duration round 1 of array method: 0.539 ms
duration round 2 of array method: 0.529 ms
duration round 3 of array method: 0.530 ms
duration round 1 of mul method: 0.258 ms
duration round 2 of mul method: 5.563 ms
duration round 3 of mul method: 0.258 ms
duration round 1 of shift method: 0.240 ms
duration round 2 of shift method: 0.248 ms
duration round 3 of shift method: 0.240 ms
jd@jd:~$

Sincerely,

Joshua D. Drake


Qingqing Zhou wrote:

>"Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
>
>>Also, I would like to see the actual test code.  I wonder whether what
>>you measured is the ability of the compiler to optimize references to
>>successive elements of an array inside a loop; that has little or
>>nothing to do with the typical usage of BufferGetBlock().
>>
>>
>>
>
>The source code is attached.
>
>compiled with "gcc testbuf.c". I tried -O2 actually, and it turns out that
>the timing is reduced a lot so not believable.
>---
>
>/*
> * testbuf.c
> */
>#include <stdio.h>
>#include <sys/file.h>
>#include <sys/param.h>
>#include <sys/stat.h>
>#include <sys/time.h>
>#include <unistd.h>
>#include <fcntl.h>
>
>#define BLCKSZ  8192
>#define NBuffers 80000
>
>typedef void* Block;
>
>int main(void)
>{
> int  i, round, method;
> Block k, start;
> struct timeval start_t, stop_t;
> long usecs;
> Block *array = (Block *) calloc(NBuffers, sizeof(Block));
>
> start = (Block)0xff3386;
> for (i = 0; i < NBuffers; i++)
>  array[i] = start + BLCKSZ*i;
>
> for (method = 0; method < 3; method ++)
> {
>  start = (Block)0xff3386;
>  for (round = 0; round < 3; round ++)
>  {
>   gettimeofday(&start_t, NULL);
>   if (method == 0)
>   {
>    for (i = 0; i < NBuffers; i++)
>     k = array[i];
>   }
>   if (method == 1)
>   {
>    for (i = 0; i < NBuffers; i++)
>     k = start + i*BLCKSZ;
>   }
>   if (method == 2)
>   {
>    for (i = 0; i < NBuffers; i++)
>     k = start + (i<<13);
>   }
>   gettimeofday(&stop_t, NULL);
>
>   if (stop_t.tv_usec < start_t.tv_usec)
>   {
>     stop_t.tv_sec--;
>     stop_t.tv_usec += 1000000;
>   }
>
>   usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 1000000
>     + (long) (stop_t.tv_usec - start_t.tv_usec);
>
>   fprintf (stdout, "duration round %d of %s method: %ld.%03ld ms\n",
>       round + 1,
>       method==0?"array":method==1?"mul":"shift",
>       (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
>         (stop_t.tv_usec - start_t.tv_usec) / 1000),
>       (long) (stop_t.tv_usec - start_t.tv_usec) % 1000);
>  }
> }
>}
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>
>


Re: remove BufferBlockPointers for speed and space

From
"Qingqing Zhou"
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
> This is definitely going to tell us a lot more about the compiler's loop
> strength reduction algorithms than it is going to tell about the time to
> evaluate any one of these expressions in isolation.  What's worse, the
> compiler would be quite within its rights to detect that k isn't used
> anywhere, and optimize the loop away completely.
>

Yes, it is possible. A safter way is to examine the assembling code. And I
found there is no such optimization in Linux/gcc3.2.

> What I would suggest is first to fill another array with some large
> number of buffer numbers (randomly chosen from 1..N) and then time loops
> of the form
>
> for (i = 0; i < arraysize; i++)
> {
> bn = buffernumbers[i];
> bufferlocs[i] = BufferGetBlock(bn);
> }
>
> for all three possible definitions of BufferGetBlock (where both arrays
> are global variables, just to be sure the compiler doesn't think it can
> discard the store).  This should give some numbers worth trusting.
>

I've patched the code according to your suggestion. Result is:

SunOS/gcc 3.2
verify: 1308873472 :duration round 1 of array method: 8.906 ms
verify: 1308873472 :duration round 2 of array method: 8.775 ms
verify: 1308873472 :duration round 3 of array method: 8.761 ms
verify: 1308873472 :duration round 1 of mul method: 6.454 ms
verify: 1308873472 :duration round 2 of mul method: 6.545 ms
verify: 1308873472 :duration round 3 of mul method: 6.535 ms
verify: 1308873472 :duration round 1 of shift method: 6.534 ms
verify: 1308873472 :duration round 2 of shift method: 6.597 ms
verify: 1308873472 :duration round 3 of shift method: 6.540 ms

Linux/gcc 3.2
verify: -2141277440 :duration round 1 of array method: 2.385 ms
verify: -2141277440 :duration round 2 of array method: 2.116 ms
verify: -2141277440 :duration round 3 of array method: 2.057 ms
verify: -2141277440 :duration round 1 of mul method: 0.943 ms
verify: -2141277440 :duration round 2 of mul method: 0.983 ms
verify: -2141277440 :duration round 3 of mul method: 0.840 ms
verify: -2141277440 :duration round 1 of shift method: 0.864 ms
verify: -2141277440 :duration round 2 of shift method: 0.931 ms
verify: -2141277440 :duration round 3 of shift method: 0.968 ms


The test file is attached:
---
/*
 * buftest.c
 */
#include <stdio.h>
#include <stdlib.h>
#include <sys/file.h>
#include <sys/time.h>
#include <unistd.h>

#define BLCKSZ  8192
#define NBuffers 80000

typedef void* Block;

Block *array = NULL;
Block *reslt = NULL;
int  *bufnm = NULL;

static int grand(int min, int max)
{
    return (min + (int) (max * 1.0 * rand() / (RAND_MAX + 1.0)));
}

int main(void)
{
 int  n, i, round, method;
 Block start;
 struct timeval start_t, stop_t;
 long usecs;

 srandom(getpid());

 array = (Block *) calloc(NBuffers, sizeof(Block));
 reslt = (Block *) calloc(NBuffers, sizeof(Block));
 bufnm = (int   *) calloc(NBuffers, sizeof(int));

 start = (Block)0xff3386;
 for (i = 0; i < NBuffers; i++)
  array[i] = start + BLCKSZ*i;
 for (i = 0; i < NBuffers; i++)
  bufnm[i] = grand(0, NBuffers);

 for (method = 0; method < 3; method ++)
 {
  start = (Block)0xff3386;
  for (round = 0; round < 3; round ++)
  {
   gettimeofday(&start_t, NULL);
   if (method == 0)
   {
    for (i = 0; i < NBuffers; i++)
    {
     n = bufnm[i];
     reslt[i] = array[n];
    }

   }
   if (method == 1)
   {
    for (i = 0; i < NBuffers; i++)
    {
     n = bufnm[i];
     reslt[i] = start + n*BLCKSZ;
    }
   }
   if (method == 2)
   {
    for (i = 0; i < NBuffers; i++)
    {
     n = bufnm[i];
     reslt[i] = start + (n<<13);
    }
   }
   gettimeofday(&stop_t, NULL);

   usecs = 0;
   for (i = 0; i< NBuffers; i++)
    usecs += (long) reslt[i];
   fprintf(stdout, "verify: %ld :", usecs );

   if (stop_t.tv_usec < start_t.tv_usec)
   {
     stop_t.tv_sec--;
     stop_t.tv_usec += 1000000;
   }

   usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 1000000
     + (long) (stop_t.tv_usec - start_t.tv_usec);

   fprintf (stdout, "duration round %d of %s method: %ld.%03ld ms\n",
       round + 1,
       method==0?"array":method==1?"mul":"shift",
       (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
         (stop_t.tv_usec - start_t.tv_usec) / 1000),
       (long) (stop_t.tv_usec - start_t.tv_usec) % 1000);
  }
 }

 free(array);
 free(reslt);
 free(bufnm);

 return 0;
}




Re: remove BufferBlockPointers for speed and space

From
Tom Lane
Date:
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> I've patched the code according to your suggestion. Result is:
> [ snip ]

OK, that test seems a little more believable.  One point you didn't
consider is that on 64-bit machines, the integer bufnum really has
to be coerced to size_t to avoid overflow if the buffer array exceeds
2Gb.  (Which we don't support today, but might well by the end of
day tomorrow, seeing that there's a patch in the queue about it.)
But I ran the test case with the extra coercion on an IA64 machine at
Red Hat, and got substantially the same results as you did: the array
method is just slower.  Another consideration is that the array is
competing for L2 cache --- the test program can't really show that,
since it has no other use for L2 cache, but in the context of the real
database I suspect this is at least as much of a win as shaving a few
nanoseconds off the BufferGetBlock macro itself.

So ... patch applied, and thanks for the good idea!

            regards, tom lane