Re: [WIP PATCH] for Performance Improvement in Buffer Management - Mailing list pgsql-hackers
From | Amit kapila |
---|---|
Subject | Re: [WIP PATCH] for Performance Improvement in Buffer Management |
Date | |
Msg-id | 6C0B27F7206C9E4CA54AE035729E9C383BC76A4C@szxeml509-mbx Whole thread Raw |
In response to | Re: [WIP PATCH] for Performance Improvement in Buffer Management (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: [WIP PATCH] for Performance Improvement in Buffer
Management
Re: [WIP PATCH] for Performance Improvement in Buffer Management |
List | pgsql-hackers |
On Monday, November 19, 2012 5:53 AM Jeff Janes wrote: On Sun, Oct 21, 2012 at 12:59 AM, Amit kapila <amit.kapila@huawei.com> wrote: > On Saturday, October 20, 2012 11:03 PM Jeff Janes wrote: > >>Run the modes in reciprocating order? >> Sorry, I didn't understood this, What do you mean by modes in reciprocating order? > Sorry for the long delay. In your scripts, it looks like you always > run the unpatched first, and then the patched second. Yes, thats true. > By reciprocating, I mean to run them in the reverse order, or in random order. Today for some configurations, I have ran by reciprocating the order. Below are readings: Configuration 16GB (Database) -7GB (Shared Buffers) Here i had run in following order 1. Run perf report with patch for 32 client 2. Run perf report without patch for 32 client 3. Run perf report with patch for 16 client 4. Run perf report without patch for 16 client Each execution is 5 minutes, 16 client /16 thread | 32 client /32 thread @mv-free-lst @9.3devl | @mv-free-lst @9.3devl ------------------------------------------------------- 3669 4056 | 5356 5258 3987 4121 | 4625 5185 4840 4574 | 4502 6796 6465 6932 | 4558 8233 6966 7222 | 4955 8237 7551 7219 | 9115 8269 8315 7168 | 43171 8340 9102 7136 | 57920 8349 ------------------------------------------------------- 6362 6054 | 16775 7333 increase 16c/16t: 5.09% increase 32c/32t: 128.76% Apart from above, I have kept the test for 1 hour. Here again the Order of execution is first run with Patch and then original 32 client /32 thread for 1 hour @mv-free-lst @9.3devl Single-run: 9842.019229 8050.357981 Increase 32c/32t: 22% > Also, for the select only transactions, I think that 20 minutes is > much longer than necessary. I'd rather see many more runs, each one > being shorter. Have taken care, don't know if 5 mins is appropriate or you meant it to be even shorter. > Because you can't restart the server without wiping out the > shared_buffers, what I would do is make a test patch which introduces > a new guc.c setting which allows the behavior to be turned on and off > with a SIGHUP (pg_ctl reload). Okay, this is good idea. > >> The reason I can think of is because when shared buffers are less then clock sweep runs very fast and there is no bottleneck. >> Only when shared buffers increase above some threshhold, it spends reasonable time in clock sweep. > I am rather skeptical of this. When the work set doesn't fit in > memory under a select-only workload, then about half the buffers will > be evictable at any given time, and half will have usagecount=1, and a > handful will usagecount>=4 (index meta, root and branch blocks). This > will be the case over a wide range of shared_buffers, as long as it is > big enough to hold all index branch blocks but not big enough to hold > everything. Given this state of affairs, the average clock sweep > should be about 2, regardless of the exact size of shared_buffers. > The one wrinkle I could think of is if all the usagecount=1 buffers > are grouped into a continuous chunk, and all the usagecount=0 are in > another chunk. The average would still be 2, but the average would be > made up of N/2 runs of length 1, followed by one run of length N/2. > Now if 1 process is stuck in the N/2 stretch and all other processes > are waiting on that, maybe that somehow escalates the waits so that > they are larger when N is larger, but I still don't see how the math > works on that. The 2 problems which are observed in V-Tune Profiler Reports for Buffer Management are: a. Partition Lock b. Buf-Free List Lock Tommorow, I will send you some of the profiler reports for different scenario where the above is observed. I think till there is contention for partition lock, reducing contention on Buf-Free list lock might not even show up. The idea (Move the buffers to freelist) will improve situtation for both of the locks to an extent, as after Invalidating Buffers by BGWriter, backend doesn't needs to remove from hash table and hence no need of old partition lock. Hash partition lock contention will be reduce by this, only if new and old partitions are different which is quite probable as clock sweep have no care about partition when it tries to find usable buffer. > Are you working on this just because it was on the ToDo List, or > because you have actually run into a problem with it? The reason behind this work is that late last year, I have done some benchmark of PostgreSQL 9.1 with some other commercialdatabases and found that the performance of SELECT operation of PG is much below than others. "One of key observation was that for PostgreSQL, on increasing Shared Buffers the performance increases upto certain level,but after certain point this parameter doesn't increase performance, however the situation in other databases is better." As part of that activity, I have done some design study for Buffer Management/Checkpoint and some others like MVCC for variousdatabases. Some part of study for Buffer Management and Checkpoint are attached with this mail. IMO, there are certain other things which we can attemt in Buffer Management: 1. Have the separate Hot and Cold lists of shared buffers, something similar to Clock-Pro. 2. Have some amount of shared buffers reserved for Hot tables 3. Instead of moving buffers to freelist by BGWriter/Checkpoint, move them to Cold list. Cold list concept is as follow: a. Have cold lists, equal to number of partitions. b. BGWriter/Checkpoint will move it to cold partition list number equal to hash partiion number. This will address the point that, even after moving to cold list, if the access to same page occurs before somebody uses it, no I/O will be required. 4. Reducing the free-list lock contention by having multiple freelists. This has been tried, but no performance improvement. 5. Reduce the granularity of free-list lock to get next buffer. Some time back Ants Aasma has sent the patch with which performance improvement is not observed. Considering that points 5 & 6 have not given performance benefits, I think reducing contention around only free-list lock will not yield any performance gain. >I've never seen > freelist lock contention be a problem on machines with less than 8 > CPU, but both of us are testing on smaller machines. I think we > really need to test this on something bigger. Yeah, you are right. I shall try to do so. With Regards, Amit Kapila.
Attachment
pgsql-hackers by date: