Home > mailing lists

Re: Initial prefetch performance testing - Mailing list pgsql-hackers

From	Gregory Stark
Subject	Re: Initial prefetch performance testing
Date	September 23, 2008 11:55:40
Msg-id	8763onvzyp.fsf@oxford.xeocode.com Whole thread Raw
In response to	Initial prefetch performance testing (Greg Smith <gsmith@gregsmith.com>)
List	pgsql-hackers

Tree view

[resending due to the attachment being too large for the -hackers list --
weren't we going to raise it when we killed -patches?]

Greg Smith <gsmith@gregsmith.com> writes:

> Using the maximum prefetch working set tested, 8192, here's the speedup
> multiplier on this benchmark for both sorted and unsorted requests using a 8GB
> file:
>
> OS        Spindles    Unsorted X    Sorted X
> 1:Linux        1        2.3        2.1
> 2:Linux        1        1.5        1.0
> 3:Solaris    1        2.6        3.0
> 4:Linux        3        6.3        2.8
> 5:Linux (Stark)    3        5.3        3.6
> 6:Linux        10        5.4        4.9
> 7:Solaris*    48        16.9        9.2

Incidentally I've been looking primarily at the sorted numbers because they
parallel bitmap heap scans. (Note that the heap scan is only about half the
i/o of a bitmap index scan + heap scan so even if it's infinitely faster it'll
only halve the time spent in the two nodes.)

Hm, I'm disappointed with the 48-drive array here. I wonder why it maxed out
at only 10x the bandwidth of one drive. I would expect more like 24x or more.
I wonder if Solaris's aio has an internal limit on how many pending i/o
requests it can handle. Perhaps it's a tunable?

Unfortunately I don't see a convenient low-invasive way to integrate aio into
Postgres. posix_fadvise we can just issue the advice and then forget about it.
But aio we would pretty much have to pick a target buffer, pin it, issue the
aio and then remember the pin later when we need to read the buffer. That
would require restructuring the code significantly. I'm quite surprised
Solaris doesn't support posix_fadvise -- perhaps it's in some other version of
Solaris?

Here's a graph of results from this program for various sized arrays on a
single machine:

http://wiki.postgresql.org/images/a/a3/Results.svg

Each colour corresponds to an array of a different number of spindles ranging
from 1 to 15 drives. The X axis is how much prefetching was done and the Y
axis is the bandwidth obtained.

There is a distinct maximum and then dropoff and it would be great to get some
data points for larger arrays to understand where that maximum goes as the
array gets larger.

> Conclusion:  on all the systems I tested on, this approach gave excellent
> results, which makes me feel confident that I should see a corresponding
> speedup on database-level tests that use this same basic technique.  I'm not
> sure whether it might make sense to bundle this test program up somehow so
> others can use it for similar compatibility tests (I'm thinking of something
> similar to contrib/test_fsync), will revisit that after the rest of the review.
>
> Next step:  I've got two data sets (one generated, one real-world sample) that
> should demonstrate a useful heap scan prefetch speedup, and one test program I
> think will demonstrate whether the sequential scan prefetch code works right.
> Now that I've vetted all the hardware/OS combinations I hope I can squeeze that
> in this week, I don't need to test all of them now that I know which are the
> interesting systems.

I have an updated patch I'll be sending along shortly. You might want to test
with that?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 23 September 2008, 11:26:40
Subject: Re: WIP patch: Collation support

From: Tom Lane
Date: 23 September 2008, 12:15:04
Subject: Re: pg_type.h regression?

Re: Initial prefetch performance testing - Mailing list pgsql-hackers

Previous

Next