Home > mailing lists

Re: Page replacement algorithm in buffer cache - Mailing list pgsql-hackers

From	Merlin Moncure
Subject	Re: Page replacement algorithm in buffer cache
Date	March 22, 2013 20:09:14
Msg-id	CAHyXU0yKC-HWk9hhBjUCPsnPMqhskvu_MEqB2cvmRu0d0+BsCw@mail.gmail.com Whole thread Raw
In response to	Re: Page replacement algorithm in buffer cache (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Page replacement algorithm in buffer cache
List	pgsql-hackers

Tree view

On Fri, Mar 22, 2013 at 2:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> On Fri, Mar 22, 2013 at 1:13 PM, Atri Sharma <atri.jiit@gmail.com> wrote:
>>> What is the general thinking? Is it time to start testing again and
>>> thinking about improvements to the current algorithm?
>
>> well, what problem are you trying to solve exactly?  the main problems
>> I see today are not so much in terms of page replacement but spinlock
>> and lwlock contention.
>
> Even back when we last hacked on that algorithm, the concerns were not
> so much about which pages it replaced as how much overhead and
> contention was created by the management algorithm.  I haven't seen any
> reason to think we have a problem with the quality of the replacement
> choices.  The proposal to increase the initial usage count would
> definitely lead to more overhead/contention, though, because it would
> result in having to circle around all the buffers more times (on
> average) to get a free buffer.

yup...absolutely.  I have a hunch that the occasional gripes we see
about server stalls under high load with read only (or mostly read
only) loads are coming from spinlock contention under the lwlock
hitting a critical point and shutting the server down effectively
until by chance the backend with the lwlock gets lucky and lands the
spinlock.

I think there is some very low hanging optimization fruit in the clock
sweep loop.   first and foremost, I see no good reason why when
scanning pages we have to spin and wait on a buffer in order to
pedantically adjust usage_count.  some simple refactoring there could
set it up so that a simple TAS (or even a TTAS with the first test in
front of the cache line lock as we done automatically in x86 IIRC)
could guard the buffer and, in the event of any lock detected, simply
move on to the next candidate without messing around with that buffer
at all.   This could construed as a 'trylock' variant of a spinlock
and might help out with cases where an especially hot buffer is
locking up the sweep.  This is exploiting the fact that from
StrategyGetBuffer we don't need a *particular* buffer, just *a*
buffer.

I also wonder if we shouldn't (perhaps in addition to the above)
resuscitate Jeff Jane's idea to get rid of the lwlock completely and
manage everything with spinlocks..

Naturally, all of this would have to be confirmed with some very robust testing.

merlin

pgsql-hackers by date:

From: Tom Lane
Date: 22 March 2013, 19:55:12
Subject: Re: Default connection parameters for postgres_fdw and dblink

From: Tom Lane
Date: 22 March 2013, 20:16:27
Subject: Re: Page replacement algorithm in buffer cache

Re: Page replacement algorithm in buffer cache - Mailing list pgsql-hackers

Previous

Next