Re: our buffer replacement strategy is kind of lame - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: our buffer replacement strategy is kind of lame |
Date | |
Msg-id | CA+U5nMJ2T5Zt844=h6AkMLTANzJ+SsDMFV8ZM0hGeSpns3wfMw@mail.gmail.com Whole thread Raw |
In response to | Re: our buffer replacement strategy is kind of lame (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Sun, Aug 14, 2011 at 1:44 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Aug 14, 2011 at 6:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Sat, Aug 13, 2011 at 11:14 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Sat, Aug 13, 2011 at 4:40 PM, Greg Stark <stark@mit.edu> wrote: >>>> On Sat, Aug 13, 2011 at 8:52 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>>>> and possibly we ought to put them all in a >>>>> linked list so that the next guy who needs a buffer can just pop one >>>> >>>> The whole point of the clock sweep algorithm is to approximate an LRU >>>> without needing to maintain a linked list. The problem with a linked >>>> list is that you need to serialize access to it so every time you >>>> reference a buffer you need to wait on a lock for the list so you can >>>> move that buffer around in the list. >>> >>> Right, but the same doesn't apply to what I'm talking about. You just >>> put each guy into the linked list when his reference count goes down >>> to 0. When you want to allocate a buffer, you pop somebody off. If >>> his reference count is still 0, you forget about him and pop the next >>> guy, and keep going until you find a suitable victim. >>> >>> The problem with just running the clock sweep every time you need a >>> victim buffer is that you may have to pass over a large number of >>> unevictable buffers before you find someone you can actually kick out. >>> That's work that should really be getting done in the background, not >>> at the absolute last moment when we discover, hey, we need a buffer. >> >> I think Greg is right here. Those suggested changes just bring back the LRU. > > Since the problem with LRU is that it requires moving each buffer to > the front of the list every time it's touched, and since the idea that > I proposed does not require that, I don't know what you mean by this. > >> If you keep a separate list then you need to serialize access to it. > > That is true. However, there are some reasons to think it would be > better than what we're doing right now. Right now, we acquire the > BufFreelistLock, and then take and release some number of spinlocks. > It seems fairly easy to construct cases where that number is routinely > over 100; even on my laptop, I could construct cases where it was > routinely 50-100 range. A linked list that we can just dequeue things > from could be protected by a single spinlock, which would hopefully be > somewhat faster. (In the interest of credit where credit is due, this > is pretty much the point Jim Nasby has been making every time this > argument comes up, and I've been dismissing it, but now that I > understand how much work gets done in that clock sweep code, I'm > starting to think he might be right.) > > However, if it turns out that that there's still too much contention > over that spinlock, we can probably fix that by maintaining N lists > instead of one, distributing buffers between them in round-robin > fashion, and having each backend choose a list based on its backend ID > modulo N. The testing I've done so far seems to indicate that > spinlock contention doesn't really become noticeable until you have > 32-40 CPUs pounding hard on the same spinlock, so even N = 4 is > probably enough to make the problem go away. But I don't even think > we're going to have to go that far right at the outset, because > there's another major source of contention in the buffer eviction > path: the buffer mapping locks. All of the above presumes that we have a list of best-next-victims. We don't have that list, so describing how we would optimise access to it means nothing. Yes, if we had it, we could dequeue them quickly - that is not the problem. The problem is that creating the best-next-victim list AND making it accurate is the act that causes contention. Yes, we can maintain an inaccurate list cheaply. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: