Re: Initial prefetch performance testing - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: Initial prefetch performance testing
Date
Msg-id 8763onvzyp.fsf@oxford.xeocode.com
Whole thread Raw
In response to Initial prefetch performance testing  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-hackers
[resending due to the attachment being too large for the -hackers list --
weren't we going to raise it when we killed -patches?]

Greg Smith <gsmith@gregsmith.com> writes:

> Using the maximum prefetch working set tested, 8192, here's the speedup
> multiplier on this benchmark for both sorted and unsorted requests using a 8GB
> file:
>
> OS        Spindles    Unsorted X    Sorted X
> 1:Linux        1        2.3        2.1
> 2:Linux        1        1.5        1.0
> 3:Solaris    1        2.6        3.0
> 4:Linux        3        6.3        2.8
> 5:Linux (Stark)    3        5.3        3.6
> 6:Linux        10        5.4        4.9
> 7:Solaris*    48        16.9        9.2

Incidentally I've been looking primarily at the sorted numbers because they
parallel bitmap heap scans. (Note that the heap scan is only about half the
i/o of a bitmap index scan + heap scan so even if it's infinitely faster it'll
only halve the time spent in the two nodes.)

Hm, I'm disappointed with the 48-drive array here. I wonder why it maxed out
at only 10x the bandwidth of one drive. I would expect more like 24x or more.
I wonder if Solaris's aio has an internal limit on how many pending i/o
requests it can handle. Perhaps it's a tunable?

Unfortunately I don't see a convenient low-invasive way to integrate aio into
Postgres. posix_fadvise we can just issue the advice and then forget about it.
But aio we would pretty much have to pick a target buffer, pin it, issue the
aio and then remember the pin later when we need to read the buffer. That
would require restructuring the code significantly. I'm quite surprised
Solaris doesn't support posix_fadvise -- perhaps it's in some other version of
Solaris?

Here's a graph of results from this program for various sized arrays on a
single machine:

http://wiki.postgresql.org/images/a/a3/Results.svg

Each colour corresponds to an array of a different number of spindles ranging
from 1 to 15 drives. The X axis is how much prefetching was done and the Y
axis is the bandwidth obtained.

There is a distinct maximum and then dropoff and it would be great to get some
data points for larger arrays to understand where that maximum goes as the
array gets larger.

> Conclusion:  on all the systems I tested on, this approach gave excellent
> results, which makes me feel confident that I should see a corresponding
> speedup on database-level tests that use this same basic technique.  I'm not
> sure whether it might make sense to bundle this test program up somehow so
> others can use it for similar compatibility tests (I'm thinking of something
> similar to contrib/test_fsync), will revisit that after the rest of the review.
>
> Next step:  I've got two data sets (one generated, one real-world sample) that
> should demonstrate a useful heap scan prefetch speedup, and one test program I
> think will demonstrate whether the sequential scan prefetch code works right.
> Now that I've vetted all the hardware/OS combinations I hope I can squeeze that
> in this week, I don't need to test all of them now that I know which are the
> interesting systems.

I have an updated patch I'll be sending along shortly. You might want to test
with that?




--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: WIP patch: Collation support
Next
From: Tom Lane
Date:
Subject: Re: pg_type.h regression?