Re: our buffer replacement strategy is kind of lame - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: our buffer replacement strategy is kind of lame
Date
Msg-id CA+U5nMJ2T5Zt844=h6AkMLTANzJ+SsDMFV8ZM0hGeSpns3wfMw@mail.gmail.com
Whole thread Raw
In response to Re: our buffer replacement strategy is kind of lame  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sun, Aug 14, 2011 at 1:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Aug 14, 2011 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Sat, Aug 13, 2011 at 11:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Sat, Aug 13, 2011 at 4:40 PM, Greg Stark <stark@mit.edu> wrote:
>>>> On Sat, Aug 13, 2011 at 8:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>>> and possibly we ought to put them all in a
>>>>> linked list so that the next guy who needs a buffer can just pop one
>>>>
>>>> The whole point of the clock sweep algorithm is to approximate an LRU
>>>> without needing to maintain a linked list. The problem with a linked
>>>> list is that you need to serialize access to it so every time you
>>>> reference a buffer you need to wait on a lock for the list so you can
>>>> move that buffer around in the list.
>>>
>>> Right, but the same doesn't apply to what I'm talking about.  You just
>>> put each guy into the linked list when his reference count goes down
>>> to 0.  When you want to allocate a buffer, you pop somebody off.  If
>>> his reference count is still 0, you forget about him and pop the next
>>> guy, and keep going until you find a suitable victim.
>>>
>>> The problem with just running the clock sweep every time you need a
>>> victim buffer is that you may have to pass over a large number of
>>> unevictable buffers before you find someone you can actually kick out.
>>>  That's work that should really be getting done in the background, not
>>> at the absolute last moment when we discover, hey, we need a buffer.
>>
>> I think Greg is right here. Those suggested changes just bring back the LRU.
>
> Since the problem with LRU is that it requires moving each buffer to
> the front of the list every time it's touched, and since the idea that
> I proposed does not require that, I don't know what you mean by this.
>
>> If you keep a separate list then you need to serialize access to it.
>
> That is true.  However, there are some reasons to think it would be
> better than what we're doing right now.  Right now, we acquire the
> BufFreelistLock, and then take and release some number of spinlocks.
> It seems fairly easy to construct cases where that number is routinely
> over 100; even on my laptop, I could construct cases where it was
> routinely 50-100 range.  A linked list that we can just dequeue things
> from could be protected by a single spinlock, which would hopefully be
> somewhat faster.  (In the interest of credit where credit is due, this
> is pretty much the point Jim Nasby has been making every time this
> argument comes up, and I've been dismissing it, but now that I
> understand how much work gets done in that clock sweep code, I'm
> starting to think he might be right.)
>
> However, if it turns out that that there's still too much contention
> over that spinlock, we can probably fix that by maintaining N lists
> instead of one, distributing buffers between them in round-robin
> fashion, and having each backend choose a list based on its backend ID
> modulo N.  The testing I've done so far seems to indicate that
> spinlock contention doesn't really become noticeable until you have
> 32-40 CPUs pounding hard on the same spinlock, so even N = 4 is
> probably enough to make the problem go away.  But I don't even think
> we're going to have to go that far right at the outset, because
> there's another major source of contention in the buffer eviction
> path: the buffer mapping locks.

All of the above presumes that we have a list of best-next-victims. We
don't have that list, so describing how we would optimise access to it
means nothing.

Yes, if we had it, we could dequeue them quickly - that is not the problem.

The problem is that creating the best-next-victim list AND making it
accurate is the act that causes contention.

Yes, we can maintain an inaccurate list cheaply.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: our buffer replacement strategy is kind of lame
Next
From: Robert Haas
Date:
Subject: Re: our buffer replacement strategy is kind of lame