Re: Scaling shared buffer eviction - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Scaling shared buffer eviction
Date
Msg-id CAA4eK1LFGcvzMdcD5NZx7B2gCbP1G7vWK7w32EZk=VOOLUds-A@mail.gmail.com
Whole thread Raw
In response to Re: Scaling shared buffer eviction  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Scaling shared buffer eviction
List pgsql-hackers
On Tue, Sep 16, 2014 at 10:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 16, 2014 at 8:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
In most cases performance with patch is slightly less as compare
to HEAD and the difference is generally less than 1% and in a case
or 2 close to 2%. I think the main reason for slight difference is that
when the size of shared buffers is almost same as data size, the number
of buffers it needs from clock sweep are very less, as an example in first
case (when size of shared buffers is 12286MB), it actually needs at most
256 additional buffers (2MB) via clock sweep, where as bgreclaimer
will put 2000 (high water mark) additional buffers (0.5% of shared buffers
is greater than 2000 ) in free list, so bgreclaimer does some extra work
when it is not required and it also leads to condition you mentioned
down (freelist will contain buffers that have already been touched since
we added them).  Now for case 2 (12166MB), we need buffers more
than 2000 additional buffers, but not too many, so it can also have
similar effect.

So there are two suboptimal things that can happen and they pull in opposite directions.  I think you should instrument the server how often each is happening.  #1 is that we can pop a buffer from the freelist and find that it's been touched.  That means we wasted the effort of putting it on the freelist in the first place.  #2 is that we can want to pop a buffer from the freelist and find it empty and thus be forced to run the clock sweep ourselves.   If we're having problem #1, we could improve things by reducing the water marks.  If we're having problem #2, we could improve things by increasing the water marks.  If we're having both problems, then I dunno.  But let's get some numbers on the frequency of these specific things, rather than just overall tps numbers.

Specific numbers of both the configurations for which I have
posted data in previous mail are as follows:

Scale Factor - 800
Shared_Buffers - 12286MB (Total db size is 12288MB)
Client and Thread Count = 64
buffers_touched_freelist - count of buffers that backends found touched after
popping from freelist.
buffers_backend_clocksweep - count of buffer allocations not satisfied from freelist 

buffers_alloc1531023
buffers_backend_clocksweep0
buffers_touched_freelist0


Scale Factor - 800
Shared_Buffers - 12166MB (Total db size is 12288MB)
Client and Thread Count = 64

buffers_alloc1531010
buffers_backend_clocksweep0
buffers_touched_freelist0


In both the above cases, I have taken data multiple times to ensure
correctness.  From the above data, it is evident that in both the above
configurations all the requests are satisfied from the initial freelist.
Basically the amount of shared buffers configured
(12286MB = 1572608 buffers and 12166MB = 1557248 buffers) are
sufficient to contain all the work load for pgbench run.

So now the question is why we are seeing small variation (<1%) in data
in case all the data fits in shared buffers and the reason could be that
we have added few extra instructions (due to increase in StrategyControl
structure size, additional function call, one or two new assignments) in the
Buffer Allocation path (the extra instructions will also be only till all the data
pages gets associated with buffers, after that the control won't even reach
StrategyGetBuffer()) or it may be due to variation across different runs with
different binaries.

I have went ahead to take the data in cases shared buffers are tiny bit (0.1%
and .05%) less than workload (based on buffer allocations done in above cases).

Performance Data
-------------------------------


Scale Factor - 800
Shared_Buffers - 11950MB

Client_Count/Patch_Ver8163264128
HEAD68424132540195496279511283280
sbe_v968565132709194631284351289333

Scale Factor - 800
Shared_Buffers - 11955MB 

Client_Count/Patch_Ver8163264128
HEAD68331127752196385274387281753
sbe_v968922131314194452284292287221

The above data indicates that performance is better with patch
in almost all cases and especially at high concurrency (64 and
128 client count).

The overall conclusion is that with patch
a. when the data can fit in RAM and not completely in shared buffers,
the performance/scalability is quite good even if shared buffers are just
tiny bit less that all the data.
b. when shared buffers are sufficient to contain all the data, then there is
a slight difference (<1%) in performance.  

 
d. Lets not do anything as if user does such a configuration, he should
be educated to configure shared buffers in a better way and or the
performance hit doesn't seem to be justified to do any further
work.

At least worth entertaining.

Based on further analysis, I think this is the way to go.

Attached find the patch for new stat (buffers_touched_freelist) just in
case you want to run the patch with it and detailed (individual run)
performance data.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: GCC memory barriers are missing "cc" clobbers
Next
From: Rahila Syed
Date:
Subject: Re: [REVIEW] Re: Compression of full-page-writes