On Tue, Apr 15, 2014 at 9:30 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> There are many reports of improvement from lowering shared_buffers.
> The problem is that it tends to show up on complex production
> workloads and that there is no clear evidence pointing to problems
> with the clock sweep; it could be higher up in the partition locks or
> something else entirely (like the O/S). pgbench is also not the
> greatest tool for sniffing out these cases: it's too random and for
> large database optimization is generally an exercise in de-randomizing
> i/o patterns. We really, really need a broader testing suite that
> covers more usage patterns.
I find it quite dissatisfying that we know so little about this.
I'm finding that my patch helps much less when shared_buffers is sized
large enough to fit the index entirely (although there are still some
localized stalls on master, where there are none with patched).
shared_buffers is still far too small to fit the entire heap. With
shared_buffers=24GB (which still leaves just under 8GB of memory for
the OS to use as cache, since this system has 32GB of main memory),
the numbers are much less impressive relative to master with the same
configuration. Both sets of numbers are still better than what you've
already seen with shared_buffers=8GB, since of course the "no more
than 8GB" recommendation is not an absolute, and as you say its
efficacy seemingly cannot be demonstrated with pgbench.
My guess is that the patch doesn't help because once there is more
than enough room to cache the entire index (slightly over twice as
many buffers as would be required to do so), even on master it becomes
virtually impossible to evict those relatively popular index pages,
since they still have an early advantage. It doesn't matter that
master's clock sweep has what I've called an excessively short-term
perspective, because there is always enough pressure relative to the
number of leaf pages being pinned to prefer to evict heap pages. There
is still a lot of buffers that can fit some moderate proportion of all
heap pages even after buffering the entire index (something like
~13GB).
You might say that with this new shared_buffers setting, clock sweep
doesn't need to have a "good memory", because it can immediately
observe the usefulness of B-Tree leaf pages.
There is no need to limit myself to speculation here, of course. I'll
check it out using pg_buffercache.
--
Peter Geoghegan