Re: [HACKERS] Sequential scan speed, mmap, disk i/o - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: [HACKERS] Sequential scan speed, mmap, disk i/o |
Date | |
Msg-id | 199805160106.VAA26055@candle.pha.pa.us Whole thread Raw |
In response to | Re: [HACKERS] Sequential scan speed, mmap, disk i/o (Michal Mosiewicz <mimo@interdata.com.pl>) |
Responses |
Re: [HACKERS] Sequential scan speed, mmap, disk i/o
|
List | pgsql-hackers |
> > mmap() is very slow, perhaps because you are changing the process > > virtual table maps for each chunk you read in, and faulting them in, > > rather than using the file system for I/O. > > Huh, very slow? I wouldn't agree. I rewrote your mmap program to allow > for using reads or mmaps. > > I tested it on 111MB file. I decided to use 8192 bytes buffer size > (standard postgres page size). My system is Linux, P166, 64MBs of RAM > (note that I have a lot of software running currently so the cache size > is less than 25MBs. I also changed the for(j..) step size to j+=256 just > to make sure that it won't influence the results too much and you will > see the difference better. mmap was run with (PROT_READ, MAP_SHARED) > > Average results are (for sequential reading): > Using reads: total time - 21.39 (0.44user, 6.09system, 31%CPU) > Using mmaps: total time - 21.10 (0.57user, 4.92system, 25%CPU) > > Note, that in case of reads the program spends much more time in system > calls and uses more CPU. You may notice that in case of Linux using mmap > is about 20% cheapper than read. In case of random reading it's slightly > more than 20% as I remember. Total time is in both cases similiar since > the throughput limit of my HD. > > BTW. Are you sure, that your program was counting mmaps properly? When I > run it on my system it counts much more than what it should. On my > system offset crossed over file's boundary then it worked a minute or > more before it stopped. I attach my version (with hardcoded 111MBs file > size to prevent it, of course you may change it) OK, here are my results using your test program: Basically, Linux is double my speed for 8k mmap'ed chunks. Around 32k chunks, I get closer, and 8mb chunks are the same. Glad to hear Linux has optimized mmap() recently, because BSD/OS looks much slower than Linux on this. Now, why does PostgreSQL sequential scan a 160MB files in 37 seconds, using standard its 8k buffers, when even your read test for me using 8k buffers takes 54 seconds? In storage/file/fd.c, I see it using read(), and I assume they are 8k chunks being read: returnCode = read(VfdCache[file].fd, buffer, amount); Also attached is a modified version of my mmap() program, that uses fstat() to check the file size to know when to stop. However, I have also have modified it to use a file size to match your file size. Not sure what to conclude from these numbers. --------------------------------------------------------------------------- mmap, 8k 47.81 real 0.66 user 33.12 sys read, 8k 54.60 real 0.51 user 46.80 sys mmap, 32k 29.80 real 0.23 user 13.81 sys read, 32k 26.80 real 0.12 user 14.82 sys mmap, 8mb 21.25 real 0.03 user 5.49 sys read, 8mb 20.43 real 0.14 user 3.60 sys my mmap, 8k, your file size 64.67 real 15.99 user 34.00 sys my mmap, 32k, your file size 43.12 real 15.95 user 14.29 sys my mmap, 8mb, your file size 34.31 real 15.88 user 5.39 sys --------------------------------------------------------------------------- #include <stdio.h> #include <fcntl.h> #include <assert.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/mman.h> #define MMAP_SIZE 8192 * 1024 int main(int argc, char *argv[], char *envp[]) { int i, j, fd, spaces = 0; int off; char *addr; struct stat filestat; fd = open("/u/pg/data/base/test/test", O_RDONLY, 0); assert(fd != -1); assert(fstat(fd, &filestat) == 0); filestat.st_size = 111329280; for (off = 0; 1; off += MMAP_SIZE) { addr = mmap(0, MMAP_SIZE, PROT_READ, MAP_SHARED, fd, off); assert(addr != NULL); madvise(addr, MMAP_SIZE, MADV_SEQUENTIAL); for (j = 0; j < MMAP_SIZE; j++) { if (*(addr + j) != ' ') spaces++; if (off + j + 1 == filestat.st_size) goto done; } munmap(addr,MMAP_SIZE); } done: printf("%d\n",spaces); return 0; } -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
pgsql-hackers by date: