Re: Initial prefetch performance testing - Mailing list pgsql-hackers
From | Gregory Stark |
---|---|
Subject | Re: Initial prefetch performance testing |
Date | |
Msg-id | 8763onvzyp.fsf@oxford.xeocode.com Whole thread Raw |
In response to | Initial prefetch performance testing (Greg Smith <gsmith@gregsmith.com>) |
List | pgsql-hackers |
[resending due to the attachment being too large for the -hackers list -- weren't we going to raise it when we killed -patches?] Greg Smith <gsmith@gregsmith.com> writes: > Using the maximum prefetch working set tested, 8192, here's the speedup > multiplier on this benchmark for both sorted and unsorted requests using a 8GB > file: > > OS Spindles Unsorted X Sorted X > 1:Linux 1 2.3 2.1 > 2:Linux 1 1.5 1.0 > 3:Solaris 1 2.6 3.0 > 4:Linux 3 6.3 2.8 > 5:Linux (Stark) 3 5.3 3.6 > 6:Linux 10 5.4 4.9 > 7:Solaris* 48 16.9 9.2 Incidentally I've been looking primarily at the sorted numbers because they parallel bitmap heap scans. (Note that the heap scan is only about half the i/o of a bitmap index scan + heap scan so even if it's infinitely faster it'll only halve the time spent in the two nodes.) Hm, I'm disappointed with the 48-drive array here. I wonder why it maxed out at only 10x the bandwidth of one drive. I would expect more like 24x or more. I wonder if Solaris's aio has an internal limit on how many pending i/o requests it can handle. Perhaps it's a tunable? Unfortunately I don't see a convenient low-invasive way to integrate aio into Postgres. posix_fadvise we can just issue the advice and then forget about it. But aio we would pretty much have to pick a target buffer, pin it, issue the aio and then remember the pin later when we need to read the buffer. That would require restructuring the code significantly. I'm quite surprised Solaris doesn't support posix_fadvise -- perhaps it's in some other version of Solaris? Here's a graph of results from this program for various sized arrays on a single machine: http://wiki.postgresql.org/images/a/a3/Results.svg Each colour corresponds to an array of a different number of spindles ranging from 1 to 15 drives. The X axis is how much prefetching was done and the Y axis is the bandwidth obtained. There is a distinct maximum and then dropoff and it would be great to get some data points for larger arrays to understand where that maximum goes as the array gets larger. > Conclusion: on all the systems I tested on, this approach gave excellent > results, which makes me feel confident that I should see a corresponding > speedup on database-level tests that use this same basic technique. I'm not > sure whether it might make sense to bundle this test program up somehow so > others can use it for similar compatibility tests (I'm thinking of something > similar to contrib/test_fsync), will revisit that after the rest of the review. > > Next step: I've got two data sets (one generated, one real-world sample) that > should demonstrate a useful heap scan prefetch speedup, and one test program I > think will demonstrate whether the sequential scan prefetch code works right. > Now that I've vetted all the hardware/OS combinations I hope I can squeeze that > in this week, I don't need to test all of them now that I know which are the > interesting systems. I have an updated patch I'll be sending along shortly. You might want to test with that? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
pgsql-hackers by date: