Re: Warm-cache prefetching - Mailing list pgsql-hackers
From | Min Xu (Hsu) |
---|---|
Subject | Re: Warm-cache prefetching |
Date | |
Msg-id | 20051209044649.GF2189@cs.wisc.edu Whole thread Raw |
In response to | Warm-cache prefetching (Qingqing Zhou <zhouqq@cs.toronto.edu>) |
Responses |
Re: Warm-cache prefetching
|
List | pgsql-hackers |
Perhaps because P4 is already doing H/W prefetching? http://www.tomshardware.com/2000/11/20/intel/page5.html I ran the test program on an opteron 2.2G: % ./a.out 10 16 Sum: -951304192: with prefetch on - duration: 81.166 ms Sum: -951304192: with prefetch off - duration: 79.769 ms Sum: -951304192: with prefetch on - duration: 81.173 ms Sum: -951304192: with prefetch off - duration: 79.718 ms Sum: -951304192: with prefetch on - duration: 83.293 ms Sum: -951304192: with prefetch off - duration: 79.731 ms Sum: -951304192: with prefetch on - duration: 81.227 ms Sum: -951304192: with prefetch off - duration: 79.851 ms Sum: -951304192: with prefetch on - duration: 81.003 ms Sum: -951304192: with prefetch off - duration: 79.724 ms Sum: -951304192: with prefetch on - duration: 81.084 ms Sum: -951304192: with prefetch off - duration: 79.728 ms Sum: -951304192: with prefetch on - duration: 81.009 ms Sum: -951304192: with prefetch off - duration: 79.723 ms Sum: -951304192: with prefetch on - duration: 81.074 ms Sum: -951304192: with prefetch off - duration: 79.719 ms Sum: -951304192: with prefetch on - duration: 81.188 ms Sum: -951304192: with prefetch off - duration: 79.724 ms Sum: -951304192: with prefetch on - duration: 81.075 ms Sum: -951304192: with prefetch off - duration: 79.719 ms Got slowdown. I ran it on a PIII 650M: % ./a.out 10 16 Sum: -951304192: with prefetch on - duration: 284.952 ms Sum: -951304192: with prefetch off - duration: 291.439 ms Sum: -951304192: with prefetch on - duration: 290.690 ms Sum: -951304192: with prefetch off - duration: 299.692 ms Sum: -951304192: with prefetch on - duration: 295.287 ms Sum: -951304192: with prefetch off - duration: 290.992 ms Sum: -951304192: with prefetch on - duration: 285.116 ms Sum: -951304192: with prefetch off - duration: 294.127 ms Sum: -951304192: with prefetch on - duration: 286.986 ms Sum: -951304192: with prefetch off - duration: 291.001 ms Sum: -951304192: with prefetch on - duration: 283.233 ms Sum: -951304192: with prefetch off - duration: 401.910 ms Sum: -951304192: with prefetch on - duration: 297.021 ms Sum: -951304192: with prefetch off - duration: 307.814 ms Sum: -951304192: with prefetch on - duration: 287.201 ms Sum: -951304192: with prefetch off - duration: 303.870 ms Sum: -951304192: with prefetch on - duration: 286.962 ms Sum: -951304192: with prefetch off - duration: 352.779 ms Sum: -951304192: with prefetch on - duration: 283.245 ms Sum: -951304192: with prefetch off - duration: 294.422 ms looks like some speedup to me. On Thu, 08 Dec 2005 Qingqing Zhou wrote : > > I found an interesting paper improving index speed by prefetching memory > data to L1/L2 cache here (there is discussion about prefetching disk > data to memory several days ago "ice-breaker thread"): > http://www.cs.cmu.edu/~chensm/papers/index_pf_final.pdf > > Also related technique used to speedup memcpy: > http://people.redhat.com/arjanv/pIII.c > > I wonder if we could use it to speed up in-memory scan opertion for heap > or index. Tom's patch has made scan can handle a page (vs. row) every > time, which is a basis for this optimization. > > I write a program try to simulate it, but I am not good at micro > optimization, and I just get a very weak but kind-of-stable improvement. I > wonder if any people here are interested to take a look. > > Regards, > Qingqing > > ---------------------------------- > > Test results > -------------- > Cache line size: 64 > CPU: P4 2.4G > $#./prefetch 10 16 > Sum: -951304192: with prefetch on - duration: 42.163 ms > Sum: -951304192: with prefetch off - duration: 42.838 ms > Sum: -951304192: with prefetch on - duration: 44.044 ms > Sum: -951304192: with prefetch off - duration: 42.792 ms > Sum: -951304192: with prefetch on - duration: 42.324 ms > Sum: -951304192: with prefetch off - duration: 42.803 ms > Sum: -951304192: with prefetch on - duration: 42.189 ms > Sum: -951304192: with prefetch off - duration: 42.801 ms > Sum: -951304192: with prefetch on - duration: 42.155 ms > Sum: -951304192: with prefetch off - duration: 42.827 ms > Sum: -951304192: with prefetch on - duration: 42.179 ms > Sum: -951304192: with prefetch off - duration: 42.798 ms > Sum: -951304192: with prefetch on - duration: 42.180 ms > Sum: -951304192: with prefetch off - duration: 42.804 ms > Sum: -951304192: with prefetch on - duration: 42.193 ms > Sum: -951304192: with prefetch off - duration: 42.827 ms > Sum: -951304192: with prefetch on - duration: 42.164 ms > Sum: -951304192: with prefetch off - duration: 42.810 ms > Sum: -951304192: with prefetch on - duration: 42.182 ms > Sum: -951304192: with prefetch off - duration: 42.826 ms > > Test program > ---------------- > > /* > * prefetch.c > * PostgreSQL warm-cache sequential scan simulator with prefetch > */ > > #include <stdio.h> > #include <stdlib.h> > #include <memory.h> > #include <errno.h> > #include <sys/time.h> > > typedef char bool; > #define true ((bool) 1) > #define false ((bool) 0) > > #define BLCKSZ 8192 > #define CACHESZ 64 > #define NBLCKS 5000 > > int sum; > > int > main(int argc, char *argv[]) > { > int i, rounds; > char *blocks; > int cpu_cost; > > if (argc != 3) > { > fprintf(stderr, "usage: prefetch <rounds> <cpu_cost [1, 16]>\n"); > exit(-1); > } > > rounds = atoi(argv[1]); > cpu_cost = atoi(argv[2]); > if (cpu_cost > 16) > exit(-1); > > for (i = 0; i < 2*rounds; i++) > { > int j, k; > struct timeval start_t, stop_t; > bool enable = i%2?false:true; > char *blck; > > blocks = (char *)malloc(BLCKSZ*NBLCKS); > memset(blocks, 'a', BLCKSZ*NBLCKS); > > sum = 0; > gettimeofday(&start_t, NULL); > > for (j = 0; j < NBLCKS; j++) > { > blck = blocks + j*BLCKSZ; > for (k=0; k < BLCKSZ; k+=CACHESZ) > { > int *s = (int *)(blck + k); > int u = cpu_cost; > > if (enable) > { > /* prefetch ahead */ > __asm__ __volatile__ ( > "1: prefetchnta 128(%0)\n" > : : "r" (s) : "memory" ); > } > > /* pretend to process current tuple */ > while (u--) sum += (*(s+u))*(*(s+u)); > } > } > gettimeofday(&stop_t, NULL); > > free(blocks); > > /* measure the time */ > if (stop_t.tv_usec < start_t.tv_usec) > { > stop_t.tv_sec--; > stop_t.tv_usec += 1000000; > } > fprintf (stdout, "Sum: %d: with prefetch %s - duration: %ld.%03ld ms\n", > sum, > enable?"on":"off", > (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 + > (stop_t.tv_usec - start_t.tv_usec) / 1000), > (long) (stop_t.tv_usec - start_t.tv_usec) % 1000); > > } > > exit(0); > } > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
pgsql-hackers by date: