Home > mailing lists

Re: Scaling shared buffer eviction - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Scaling shared buffer eviction
Date	September 19, 2014 11:21:57
Msg-id	CAA4eK1LFGcvzMdcD5NZx7B2gCbP1G7vWK7w32EZk=VOOLUds-A@mail.gmail.com Whole thread Raw
In response to	Re: Scaling shared buffer eviction (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Scaling shared buffer eviction
List	pgsql-hackers

Tree view

On Tue, Sep 16, 2014 at 10:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Sep 16, 2014 at 8:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
In most cases performance with patch is slightly less as compare
to HEAD and the difference is generally less than 1% and in a case
or 2 close to 2%. I think the main reason for slight difference is that
when the size of shared buffers is almost same as data size, the number
of buffers it needs from clock sweep are very less, as an example in first
case (when size of shared buffers is 12286MB), it actually needs at most
256 additional buffers (2MB) via clock sweep, where as bgreclaimer
will put 2000 (high water mark) additional buffers (0.5% of shared buffers
is greater than 2000 ) in free list, so bgreclaimer does some extra work
when it is not required and it also leads to condition you mentioned
down (freelist will contain buffers that have already been touched since
we added them). Now for case 2 (12166MB), we need buffers more
than 2000 additional buffers, but not too many, so it can also have
similar effect.

So there are two suboptimal things that can happen and they pull in opposite directions. I think you should instrument the server how often each is happening. #1 is that we can pop a buffer from the freelist and find that it's been touched. That means we wasted the effort of putting it on the freelist in the first place. #2 is that we can want to pop a buffer from the freelist and find it empty and thus be forced to run the clock sweep ourselves. If we're having problem #1, we could improve things by reducing the water marks. If we're having problem #2, we could improve things by increasing the water marks. If we're having both problems, then I dunno. But let's get some numbers on the frequency of these specific things, rather than just overall tps numbers.

Specific numbers of both the configurations for which I have

posted data in previous mail are as follows:

Scale Factor - 800

Shared_Buffers - 12286MB (Total db size is 12288MB)

Client and Thread Count = 64

buffers_touched_freelist - count of buffers that backends found touched after

popping from freelist.

buffers_backend_clocksweep - count of buffer allocations not satisfied from freelist

buffers_alloc	1531023
buffers_backend_clocksweep	0
buffers_touched_freelist	0

Scale Factor - 800
Shared_Buffers - 12166MB (Total db size is 12288MB)

Client and Thread Count = 64

buffers_alloc	1531010
buffers_backend_clocksweep	0
buffers_touched_freelist	0

In both the above cases, I have taken data multiple times to ensure

correctness. From the above data, it is evident that in both the above

configurations all the requests are satisfied from the initial freelist.

Basically the amount of shared buffers configured

(12286MB = 1572608 buffers and 12166MB = 1557248 buffers) are

sufficient to contain all the work load for pgbench run.

So now the question is why we are seeing small variation (<1%) in data

in case all the data fits in shared buffers and the reason could be that

we have added few extra instructions (due to increase in StrategyControl

structure size, additional function call, one or two new assignments) in the

Buffer Allocation path (the extra instructions will also be only till all the data

pages gets associated with buffers, after that the control won't even reach

StrategyGetBuffer()) or it may be due to variation across different runs with

different binaries.

I have went ahead to take the data in cases shared buffers are tiny bit (0.1%

and .05%) less than workload (based on buffer allocations done in above cases).

Performance Data

-------------------------------

Scale Factor - 800

Shared_Buffers - 11950MB

Client_Count/Patch_Ver	8	16	32	64	128
HEAD	68424	132540	195496	279511	283280
sbe_v9	68565	132709	194631	284351	289333

Scale Factor - 800

Shared_Buffers - 11955MB

Client_Count/Patch_Ver	8	16	32	64	128
HEAD	68331	127752	196385	274387	281753
sbe_v9	68922	131314	194452	284292	287221

The above data indicates that performance is better with patch

in almost all cases and especially at high concurrency (64 and

128 client count).

The overall conclusion is that with patch

a. when the data can fit in RAM and not completely in shared buffers,

the performance/scalability is quite good even if shared buffers are just

tiny bit less that all the data.

b. when shared buffers are sufficient to contain all the data, then there is

a slight difference (<1%) in performance.

d. Lets not do anything as if user does such a configuration, he should
be educated to configure shared buffers in a better way and or the
performance hit doesn't seem to be justified to do any further
work.

At least worth entertaining.

Based on further analysis, I think this is the way to go.

Attached find the patch for new stat (buffers_touched_freelist) just in

case you want to run the patch with it and detailed (individual run)

performance data.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

From: Andres Freund
Date: 19 September 2014, 10:19:10
Subject: Re: GCC memory barriers are missing "cc" clobbers

From: Rahila Syed
Date: 19 September 2014, 11:38:28
Subject: Re: [REVIEW] Re: Compression of full-page-writes

Re: Scaling shared buffer eviction - Mailing list pgsql-hackers

Attachment

Previous

Next