Re: Scaling shared buffer eviction - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Scaling shared buffer eviction |
Date | |
Msg-id | CAA4eK1LFGcvzMdcD5NZx7B2gCbP1G7vWK7w32EZk=VOOLUds-A@mail.gmail.com Whole thread Raw |
In response to | Re: Scaling shared buffer eviction (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Scaling shared buffer eviction
|
List | pgsql-hackers |
On Tue, Sep 16, 2014 at 10:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 16, 2014 at 8:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:In most cases performance with patch is slightly less as compareto HEAD and the difference is generally less than 1% and in a caseor 2 close to 2%. I think the main reason for slight difference is thatwhen the size of shared buffers is almost same as data size, the numberof buffers it needs from clock sweep are very less, as an example in firstcase (when size of shared buffers is 12286MB), it actually needs at most256 additional buffers (2MB) via clock sweep, where as bgreclaimerwill put 2000 (high water mark) additional buffers (0.5% of shared buffersis greater than 2000 ) in free list, so bgreclaimer does some extra workwhen it is not required and it also leads to condition you mentioneddown (freelist will contain buffers that have already been touched sincewe added them). Now for case 2 (12166MB), we need buffers morethan 2000 additional buffers, but not too many, so it can also havesimilar effect.So there are two suboptimal things that can happen and they pull in opposite directions. I think you should instrument the server how often each is happening. #1 is that we can pop a buffer from the freelist and find that it's been touched. That means we wasted the effort of putting it on the freelist in the first place. #2 is that we can want to pop a buffer from the freelist and find it empty and thus be forced to run the clock sweep ourselves. If we're having problem #1, we could improve things by reducing the water marks. If we're having problem #2, we could improve things by increasing the water marks. If we're having both problems, then I dunno. But let's get some numbers on the frequency of these specific things, rather than just overall tps numbers.
Specific numbers of both the configurations for which I have
posted data in previous mail are as follows:
Scale Factor - 800
Shared_Buffers - 12286MB (Total db size is 12288MB)
Client and Thread Count = 64
buffers_touched_freelist - count of buffers that backends found touched after
popping from freelist.
buffers_backend_clocksweep - count of buffer allocations not satisfied from freelist
buffers_alloc | 1531023 |
buffers_backend_clocksweep | 0 |
buffers_touched_freelist | 0 |
Scale Factor - 800
Shared_Buffers - 12166MB (Total db size is 12288MB)
Shared_Buffers - 12166MB (Total db size is 12288MB)
Client and Thread Count = 64
buffers_alloc | 1531010 |
buffers_backend_clocksweep | 0 |
buffers_touched_freelist | 0 |
In both the above cases, I have taken data multiple times to ensure
correctness. From the above data, it is evident that in both the above
configurations all the requests are satisfied from the initial freelist.
Basically the amount of shared buffers configured(12286MB = 1572608 buffers and 12166MB = 1557248 buffers) are
sufficient to contain all the work load for pgbench run.
So now the question is why we are seeing small variation (<1%) in data
in case all the data fits in shared buffers and the reason could be that
we have added few extra instructions (due to increase in StrategyControl
structure size, additional function call, one or two new assignments) in the
Buffer Allocation path (the extra instructions will also be only till all the data
pages gets associated with buffers, after that the control won't even reach
StrategyGetBuffer()) or it may be due to variation across different runs with
different binaries.
I have went ahead to take the data in cases shared buffers are tiny bit (0.1%
and .05%) less than workload (based on buffer allocations done in above cases).
Performance Data
-------------------------------
Scale Factor - 800
Shared_Buffers - 11950MB
Client_Count/Patch_Ver | 8 | 16 | 32 | 64 | 128 |
HEAD | 68424 | 132540 | 195496 | 279511 | 283280 |
sbe_v9 | 68565 | 132709 | 194631 | 284351 | 289333 |
Scale Factor - 800
Shared_Buffers - 11955MB
Client_Count/Patch_Ver | 8 | 16 | 32 | 64 | 128 |
HEAD | 68331 | 127752 | 196385 | 274387 | 281753 |
sbe_v9 | 68922 | 131314 | 194452 | 284292 | 287221 |
The above data indicates that performance is better with patch
in almost all cases and especially at high concurrency (64 and
128 client count).
The overall conclusion is that with patch
a. when the data can fit in RAM and not completely in shared buffers,
the performance/scalability is quite good even if shared buffers are just
tiny bit less that all the data.
b. when shared buffers are sufficient to contain all the data, then there is
a slight difference (<1%) in performance.
d. Lets not do anything as if user does such a configuration, he shouldbe educated to configure shared buffers in a better way and or theperformance hit doesn't seem to be justified to do any furtherwork.At least worth entertaining.
Attached find the patch for new stat (buffers_touched_freelist) just in
case you want to run the patch with it and detailed (individual run)
performance data.
Attachment
pgsql-hackers by date: