Re: Page replacement algorithm in buffer cache - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: Page replacement algorithm in buffer cache
Date
Msg-id CAHyXU0yKC-HWk9hhBjUCPsnPMqhskvu_MEqB2cvmRu0d0+BsCw@mail.gmail.com
Whole thread Raw
In response to Re: Page replacement algorithm in buffer cache  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Page replacement algorithm in buffer cache  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Mar 22, 2013 at 2:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> On Fri, Mar 22, 2013 at 1:13 PM, Atri Sharma <atri.jiit@gmail.com> wrote:
>>> What is the general thinking? Is it time to start testing again and
>>> thinking about improvements to the current algorithm?
>
>> well, what problem are you trying to solve exactly?  the main problems
>> I see today are not so much in terms of page replacement but spinlock
>> and lwlock contention.
>
> Even back when we last hacked on that algorithm, the concerns were not
> so much about which pages it replaced as how much overhead and
> contention was created by the management algorithm.  I haven't seen any
> reason to think we have a problem with the quality of the replacement
> choices.  The proposal to increase the initial usage count would
> definitely lead to more overhead/contention, though, because it would
> result in having to circle around all the buffers more times (on
> average) to get a free buffer.


yup...absolutely.  I have a hunch that the occasional gripes we see
about server stalls under high load with read only (or mostly read
only) loads are coming from spinlock contention under the lwlock
hitting a critical point and shutting the server down effectively
until by chance the backend with the lwlock gets lucky and lands the
spinlock.

I think there is some very low hanging optimization fruit in the clock
sweep loop.   first and foremost, I see no good reason why when
scanning pages we have to spin and wait on a buffer in order to
pedantically adjust usage_count.  some simple refactoring there could
set it up so that a simple TAS (or even a TTAS with the first test in
front of the cache line lock as we done automatically in x86 IIRC)
could guard the buffer and, in the event of any lock detected, simply
move on to the next candidate without messing around with that buffer
at all.   This could construed as a 'trylock' variant of a spinlock
and might help out with cases where an especially hot buffer is
locking up the sweep.  This is exploiting the fact that from
StrategyGetBuffer we don't need a *particular* buffer, just *a*
buffer.

I also wonder if we shouldn't (perhaps in addition to the above)
resuscitate Jeff Jane's idea to get rid of the lwlock completely and
manage everything with spinlocks..

Naturally, all of this would have to be confirmed with some very robust testing.

merlin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Default connection parameters for postgres_fdw and dblink
Next
From: Tom Lane
Date:
Subject: Re: Page replacement algorithm in buffer cache