Re: Scaling shared buffer eviction - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Scaling shared buffer eviction
Date
Msg-id CA+Tgmob6yOedteBB461grFnoSV2MXxD0VqGq_f0xTHdbAm7Nnw@mail.gmail.com
Whole thread Raw
In response to Re: Scaling shared buffer eviction  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Scaling shared buffer eviction
Re: Scaling shared buffer eviction
Re: Scaling shared buffer eviction
List pgsql-hackers
On Fri, Sep 19, 2014 at 7:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Specific numbers of both the configurations for which I have
posted data in previous mail are as follows:

Scale Factor - 800
Shared_Buffers - 12286MB (Total db size is 12288MB)
Client and Thread Count = 64
buffers_touched_freelist - count of buffers that backends found touched after
popping from freelist.
buffers_backend_clocksweep - count of buffer allocations not satisfied from freelist 

buffers_alloc1531023
buffers_backend_clocksweep0
buffers_touched_freelist0


I didn't believe these numbers, so I did some testing.  I used the same configuration you mention here, scale factor = 800, shared_buffers = 12286 MB, and I also saw buffers_backend_clocksweep = 0.  I didn't see buffers_touched_freelist showing up anywhere, so I don't know whether that would have been zero or not.  Then I tried reducing the high watermark for the freelist from 2000 buffers to 25 buffers, and buffers_backend_clocksweep was *still* 0.  At that point I started to smell a rat.  It turns out that, with this test configuration, there's no buffer allocation going on at all.  Everything fits in shared_buffers, or it did on my test.  I had to reduce shared_buffers down to 10491800kB before I got any significant buffer eviction.

At that level, a 100-buffer high watermark wasn't sufficient to prevent the freelist from occasionally going empty.  A 2000-buffer high water mark was by and large sufficient, although I was able to see small numbers of buffers being allocated via clocksweep right at the very beginning of the test, I guess before the reclaimer really got cranking.  So the watermarks seem to be broadly in the right ballpark, but I think the statistics reporting needs improving.  We need an easy way to measure the amount of work that bgreclaimer is actually doing.

I suggest we count these things:

1. The number of buffers the reclaimer has put back on the free list.
2. The number of times a backend has run the clocksweep.
3. The number of buffers past which the reclaimer has advanced the clock sweep (i.e. the number of buffers it had to examine in order to reclaim the number counted by #1).
4. The number of buffers past which a backend has advanced the clocksweep (i.e. the number of buffers it had to examine in order to allocate the number of buffers count by #3).
5. The number of buffers allocated from the freelist which the backend did not use because they'd been touched (what you're calling buffers_touched_freelist).

It's hard to come up with good names for all of these things that are consistent with the somewhat wonky existing names.  Here's an attempt:

1. bgreclaim_freelist
2. buffers_alloc_clocksweep (you've got buffers_backend_clocksweep, but I think we want to make it more parallel with buffers_alloc, which is the number of buffers allocated, not buffers_backend, the number of buffers *written* by a backend)
3. clocksweep_bgreclaim
4. clocksweep_backend
5. freelist_touched

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: proposal: rounding up time value less than its unit.
Next
From: Robert Haas
Date:
Subject: Re: proposal: rounding up time value less than its unit.