Re: Warm-cache prefetching - Mailing list pgsql-hackers

From Min Xu (Hsu)
Subject Re: Warm-cache prefetching
Date
Msg-id 20051209044649.GF2189@cs.wisc.edu
Whole thread Raw
In response to Warm-cache prefetching  (Qingqing Zhou <zhouqq@cs.toronto.edu>)
Responses Re: Warm-cache prefetching
List pgsql-hackers
Perhaps because P4 is already doing H/W prefetching?

http://www.tomshardware.com/2000/11/20/intel/page5.html

I ran the test program on an opteron 2.2G:

% ./a.out 10 16
Sum: -951304192: with prefetch on - duration: 81.166 ms
Sum: -951304192: with prefetch off - duration: 79.769 ms
Sum: -951304192: with prefetch on - duration: 81.173 ms
Sum: -951304192: with prefetch off - duration: 79.718 ms
Sum: -951304192: with prefetch on - duration: 83.293 ms
Sum: -951304192: with prefetch off - duration: 79.731 ms
Sum: -951304192: with prefetch on - duration: 81.227 ms
Sum: -951304192: with prefetch off - duration: 79.851 ms
Sum: -951304192: with prefetch on - duration: 81.003 ms
Sum: -951304192: with prefetch off - duration: 79.724 ms
Sum: -951304192: with prefetch on - duration: 81.084 ms
Sum: -951304192: with prefetch off - duration: 79.728 ms
Sum: -951304192: with prefetch on - duration: 81.009 ms
Sum: -951304192: with prefetch off - duration: 79.723 ms
Sum: -951304192: with prefetch on - duration: 81.074 ms
Sum: -951304192: with prefetch off - duration: 79.719 ms
Sum: -951304192: with prefetch on - duration: 81.188 ms
Sum: -951304192: with prefetch off - duration: 79.724 ms
Sum: -951304192: with prefetch on - duration: 81.075 ms
Sum: -951304192: with prefetch off - duration: 79.719 ms

Got slowdown.

I ran it on a PIII 650M:

% ./a.out 10 16
Sum: -951304192: with prefetch on - duration: 284.952 ms
Sum: -951304192: with prefetch off - duration: 291.439 ms
Sum: -951304192: with prefetch on - duration: 290.690 ms
Sum: -951304192: with prefetch off - duration: 299.692 ms
Sum: -951304192: with prefetch on - duration: 295.287 ms
Sum: -951304192: with prefetch off - duration: 290.992 ms
Sum: -951304192: with prefetch on - duration: 285.116 ms
Sum: -951304192: with prefetch off - duration: 294.127 ms
Sum: -951304192: with prefetch on - duration: 286.986 ms
Sum: -951304192: with prefetch off - duration: 291.001 ms
Sum: -951304192: with prefetch on - duration: 283.233 ms
Sum: -951304192: with prefetch off - duration: 401.910 ms
Sum: -951304192: with prefetch on - duration: 297.021 ms
Sum: -951304192: with prefetch off - duration: 307.814 ms
Sum: -951304192: with prefetch on - duration: 287.201 ms
Sum: -951304192: with prefetch off - duration: 303.870 ms
Sum: -951304192: with prefetch on - duration: 286.962 ms
Sum: -951304192: with prefetch off - duration: 352.779 ms
Sum: -951304192: with prefetch on - duration: 283.245 ms
Sum: -951304192: with prefetch off - duration: 294.422 ms

looks like some speedup to me.

On Thu, 08 Dec 2005 Qingqing Zhou wrote :
> 
> I found an interesting paper improving index speed by prefetching memory
> data to L1/L2 cache here (there is discussion about prefetching disk
> data to memory several days ago "ice-breaker thread"):
> http://www.cs.cmu.edu/~chensm/papers/index_pf_final.pdf
> 
> Also related technique used to speedup memcpy:
> http://people.redhat.com/arjanv/pIII.c
> 
> I wonder if we could use it to speed up in-memory scan opertion for heap
> or index. Tom's patch has made scan can handle a page (vs. row) every
> time, which is a basis for this optimization.
> 
> I write a program try to simulate it, but I am not good at micro
> optimization, and I just get a very weak but kind-of-stable improvement. I
> wonder if any people here are interested to take a look.
> 
> Regards,
> Qingqing
> 
> ----------------------------------
> 
> Test results
> --------------
> Cache line size: 64
> CPU: P4 2.4G
> $#./prefetch 10 16
> Sum: -951304192: with prefetch on - duration: 42.163 ms
> Sum: -951304192: with prefetch off - duration: 42.838 ms
> Sum: -951304192: with prefetch on - duration: 44.044 ms
> Sum: -951304192: with prefetch off - duration: 42.792 ms
> Sum: -951304192: with prefetch on - duration: 42.324 ms
> Sum: -951304192: with prefetch off - duration: 42.803 ms
> Sum: -951304192: with prefetch on - duration: 42.189 ms
> Sum: -951304192: with prefetch off - duration: 42.801 ms
> Sum: -951304192: with prefetch on - duration: 42.155 ms
> Sum: -951304192: with prefetch off - duration: 42.827 ms
> Sum: -951304192: with prefetch on - duration: 42.179 ms
> Sum: -951304192: with prefetch off - duration: 42.798 ms
> Sum: -951304192: with prefetch on - duration: 42.180 ms
> Sum: -951304192: with prefetch off - duration: 42.804 ms
> Sum: -951304192: with prefetch on - duration: 42.193 ms
> Sum: -951304192: with prefetch off - duration: 42.827 ms
> Sum: -951304192: with prefetch on - duration: 42.164 ms
> Sum: -951304192: with prefetch off - duration: 42.810 ms
> Sum: -951304192: with prefetch on - duration: 42.182 ms
> Sum: -951304192: with prefetch off - duration: 42.826 ms
> 
> Test program
> ----------------
> 
> /*
>  * prefetch.c
>  *        PostgreSQL warm-cache sequential scan simulator with prefetch
>  */
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include <memory.h>
> #include <errno.h>
> #include <sys/time.h>
> 
> typedef char bool;
> #define true    ((bool) 1)
> #define false    ((bool) 0)
> 
> #define BLCKSZ    8192
> #define CACHESZ    64
> #define NBLCKS    5000
> 
> int    sum;
> 
> int
> main(int argc, char *argv[])
> {
>     int     i, rounds;
>     char    *blocks;
>     int    cpu_cost;
> 
>     if (argc != 3)
>     {
>         fprintf(stderr, "usage: prefetch <rounds> <cpu_cost [1, 16]>\n");
>         exit(-1);
>     }
> 
>     rounds = atoi(argv[1]);
>     cpu_cost  = atoi(argv[2]);
>     if (cpu_cost > 16)
>         exit(-1);
> 
>     for (i = 0; i < 2*rounds; i++)
>     {
>         int    j, k;
>         struct    timeval start_t, stop_t;
>         bool    enable = i%2?false:true;
>         char    *blck;
> 
>         blocks = (char *)malloc(BLCKSZ*NBLCKS);
>         memset(blocks, 'a', BLCKSZ*NBLCKS);
> 
>         sum = 0;
>         gettimeofday(&start_t, NULL);
> 
>         for (j = 0; j < NBLCKS; j++)
>         {
>             blck = blocks + j*BLCKSZ;
>             for (k=0; k < BLCKSZ; k+=CACHESZ)
>             {
>                 int    *s = (int *)(blck + k);
>                 int    u = cpu_cost;
> 
>                 if (enable)
>                 {
>                     /* prefetch ahead */
>                     __asm__ __volatile__ (
>                                 "1: prefetchnta 128(%0)\n"
>                                     : : "r" (s) : "memory" );
>                 }
> 
>                 /* pretend to process current tuple */
>                 while (u--) sum += (*(s+u))*(*(s+u));
>             }
>         }
>         gettimeofday(&stop_t, NULL);
> 
>         free(blocks);
> 
>         /* measure the time */
>         if (stop_t.tv_usec < start_t.tv_usec)
>         {
>             stop_t.tv_sec--;
>             stop_t.tv_usec += 1000000;
>         }
>         fprintf (stdout, "Sum: %d: with prefetch %s - duration: %ld.%03ld ms\n",
>                 sum,
>                 enable?"on":"off",
>                 (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
>                         (stop_t.tv_usec - start_t.tv_usec) / 1000),
>                 (long) (stop_t.tv_usec - start_t.tv_usec) % 1000);
> 
>     }
> 
>     exit(0);
> }
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly


pgsql-hackers by date:

Previous
From: "Qingqing Zhou"
Date:
Subject: Re: Warm-cache prefetching
Next
From: Simon Riggs
Date:
Subject: Re: Reducing contention for the LockMgrLock