Re: Move unused buffers to freelist - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Move unused buffers to freelist |
Date | |
Msg-id | CA+TgmobJm0GHk58nUPRQHCGwY25n1DCkU4ku9aQeczZEjiz9mQ@mail.gmail.com Whole thread Raw |
In response to | Re: Move unused buffers to freelist (Greg Smith <greg@2ndQuadrant.com>) |
Responses |
Re: Move unused buffers to freelist
Re: Move unused buffers to freelist |
List | pgsql-hackers |
On Wed, Jun 26, 2013 at 8:09 AM, Amit Kapila <amit.kapila@huawei.com> wrote: > Configuration Details > O/S - Suse-11 > RAM - 128GB > Number of Cores - 16 > Server Conf - checkpoint_segments = 300; checkpoint_timeout = 15 min, > synchronous_commit = 0FF, shared_buffers = 14GB, AutoVacuum=off Pgbench - > Select-only Scalefactor - 1200 Time - 30 mins > > 8C-8T 16C-16T 32C-32T 64C-64T > Head 62403 101810 99516 94707 > Patch 62827 101404 99109 94744 > > On 128GB RAM, if use scalefactor=1200 (database=approx 17GB) and 14GB shared > buffers, this is no major difference. > One of the reasons could be that there is no much swapping in shared buffers > as most data already fits in shared buffers. I'd like to just back up a minute here and talk about the broader picture here. What are we trying to accomplish with this patch? Last year, I did some benchmarking on a big IBM POWER7 machine (16 cores, 64 hardware threads). Here are the results: http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html Now, if you look at these results, you see something interesting. When there aren't too many concurrent connections, the higher scale factors are only modestly slower than the lower scale factors. But as the number of connections increases, the performance continues to rise at the lower scale factors, and at the higher scale factors, this performance stops rising and in fact drops off. So in other words, there's no huge *performance* problem for a working set larger than shared_buffers, but there is a huge *scalability* problem. Now why is that? As far as I can tell, the answer is that we've got a scalability problem around BufFreelistLock. Contention on the buffer mapping locks may also be a problem, but all of my previous benchmarking (with LWLOCK_STATS) suggests that BufFreelistLock is, by far, the elephant in the room. My interest in having the background writer add buffers to the free list is basically around solving that problem. It's a pretty dramatic problem, as the graph above shows, and this patch doesn't solve it. There may be corner cases where this patch improves things (or, equally, makes them worse) but as a general point, the difficulty I've had reproducing your test results and the specificity of your instructions for reproducing them suggests to me that what we have here is not a clear improvement on general workloads. Yet such an improvement should exist, because there are other products in the world that have scalable buffer managers; we currently don't. Instead of spending a lot of time trying to figure out whether there's a small win in narrow cases here (and there may well be), I think we should back up and ask why this isn't a great big win, and what we'd need to do to *get* a great big win. I don't see much point in tinkering around the edges here if things are broken in the middle; things that seem like small wins or losses now may turn out otherwise in the face of a more comprehensive solution. One thing that occurred to me while writing this note is that the background writer doesn't have any compelling reason to run on a read-only workload. It will still run at a certain minimum rate, so that it cycles the buffer pool every 2 minutes, if I remember correctly. But it won't run anywhere near fast enough to keep up with the buffer allocation demands of 8, or 32, or 64 sessions all reading data not all of which is in shared_buffers at top speed. In fact, we've had reports that the background writer isn't too effective even on read-write workloads. The point is - if the background writer isn't waking up and running frequently enough, what it does when it does wake up isn't going to matter very much. I think we need to spend some energy poking at that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: