Re: Move unused buffers to freelist - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Move unused buffers to freelist
Date
Msg-id CA+TgmobJm0GHk58nUPRQHCGwY25n1DCkU4ku9aQeczZEjiz9mQ@mail.gmail.com
Whole thread Raw
In response to Re: Move unused buffers to freelist  (Greg Smith <greg@2ndQuadrant.com>)
Responses Re: Move unused buffers to freelist
Re: Move unused buffers to freelist
List pgsql-hackers
On Wed, Jun 26, 2013 at 8:09 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
> Configuration Details
> O/S - Suse-11
> RAM - 128GB
> Number of Cores - 16
> Server Conf - checkpoint_segments = 300; checkpoint_timeout = 15 min,
> synchronous_commit = 0FF, shared_buffers = 14GB, AutoVacuum=off Pgbench -
> Select-only Scalefactor - 1200 Time - 30 mins
>
>              8C-8T                16C-16T        32C-32T        64C-64T
> Head       62403                101810         99516          94707
> Patch      62827                101404         99109          94744
>
> On 128GB RAM, if use scalefactor=1200 (database=approx 17GB) and 14GB shared
> buffers, this is no major difference.
> One of the reasons could be that there is no much swapping in shared buffers
> as most data already fits in shared buffers.

I'd like to just back up a minute here and talk about the broader
picture here.  What are we trying to accomplish with this patch?  Last
year, I did some benchmarking on a big IBM POWER7 machine (16 cores,
64 hardware threads).  Here are the results:

http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html

Now, if you look at these results, you see something interesting.
When there aren't too many concurrent connections, the higher scale
factors are only modestly slower than the lower scale factors.  But as
the number of connections increases, the performance continues to rise
at the lower scale factors, and at the higher scale factors, this
performance stops rising and in fact drops off.  So in other words,
there's no huge *performance* problem for a working set larger than
shared_buffers, but there is a huge *scalability* problem.  Now why is
that?

As far as I can tell, the answer is that we've got a scalability
problem around BufFreelistLock.  Contention on the buffer mapping
locks may also be a problem, but all of my previous benchmarking (with
LWLOCK_STATS) suggests that BufFreelistLock is, by far, the elephant
in the room.  My interest in having the background writer add buffers
to the free list is basically around solving that problem.  It's a
pretty dramatic problem, as the graph above shows, and this patch
doesn't solve it.  There may be corner cases where this patch improves
things (or, equally, makes them worse) but as a general point, the
difficulty I've had reproducing your test results and the specificity
of your instructions for reproducing them suggests to me that what we
have here is not a clear improvement on general workloads.  Yet such
an improvement should exist, because there are other products in the
world that have scalable buffer managers; we currently don't.  Instead
of spending a lot of time trying to figure out whether there's a small
win in narrow cases here (and there may well be), I think we should
back up and ask why this isn't a great big win, and what we'd need to
do to *get* a great big win.  I don't see much point in tinkering
around the edges here if things are broken in the middle; things that
seem like small wins or losses now may turn out otherwise in the face
of a more comprehensive solution.

One thing that occurred to me while writing this note is that the
background writer doesn't have any compelling reason to run on a
read-only workload.  It will still run at a certain minimum rate, so
that it cycles the buffer pool every 2 minutes, if I remember
correctly.  But it won't run anywhere near fast enough to keep up with
the buffer allocation demands of 8, or 32, or 64 sessions all reading
data not all of which is in shared_buffers at top speed.  In fact,
we've had reports that the background writer isn't too effective even
on read-write workloads.  The point is - if the background writer
isn't waking up and running frequently enough, what it does when it
does wake up isn't going to matter very much.  I think we need to
spend some energy poking at that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Min value for port
Next
From: Robert Haas
Date:
Subject: Re: Developer meeting photos