On Apr 14, 2008, at 3:31 PM, Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
>> The transition domain where performance drops dramatically as the
>> database
>> starts to not fit in shared buffers but does still fit in
>> filesystem cache.
>
> It looks to me like the knee comes where the DB no longer fits in
> filesystem cache. What's interesting is that there seems to be no
> synergy at all between shared_buffers and the filesystem cache.
> Ideally, very hot pages would stay in shared buffers and drop out
> of the
> kernel cache, allowing you to use a database approximating all-of-RAM
> before you hit the performance wall. It's clear that in this example
> that's not happening, or at least that only a small part of shared
> buffers isn't getting duplicated in filesystem cache.
I suspect that we're getting double-buffering on everything because
every time we dirty a buffer and write it out the OS is considering
that as access, and keeping that data in it's cache. It would be
interesting to try an overcome that and see how it impacts things.
With our improvement in checkpoint handling, we might be able to just
write via DIO... if not maybe there's some way to tell the OS to
buffer the write for us, but target that data for removal from cache
as soon as it's written.
> Of course, that's because pgbench reads a randomly-chosen row of
> "accounts" in each transaction, so that there's exactly zero locality
> of access. A more realistic workload would probably have a Zipfian
> distribution of account number touches, and might look a little better
> on this type of test.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828