Re: our buffer replacement strategy is kind of lame - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: our buffer replacement strategy is kind of lame |
Date | |
Msg-id | CA+TgmoYtL0-uB+sOf+P-DJ_qhRFgRn=RdYRnv_8y8Y4r4vqfqg@mail.gmail.com Whole thread Raw |
In response to | Re: our buffer replacement strategy is kind of lame (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: our buffer replacement strategy is kind of lame
|
List | pgsql-hackers |
On Mon, Jan 2, 2012 at 2:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Get rid of the freelist? Once shared buffers are full, it's just about >> useless anyway. But you'd need to think about the test cases that you >> pay attention to, as there might be scenarios where it remains useful. > > Agree freelist is mostly useless, but startup and dropping objects requires it. Not really. If someone drops an object, we must invalidate all the buffers immediately, but adding them to the free list is just an optimization to make sure we reuse the invalidated buffers in preference to evicting buffers that still have valid contents. But I suspect that in practice this is not a very important optimization, because (1) most people probably don't drop permanent tables or databases very often and (2) buffers that are being heavily used should have a positive reference count, in which case the clock sweep will pass over them and land on one of the newly-invalidated buffers anyway. > When its full its just an integer > 0 test, which is cheap and its on > the same cacheline as the nextVictimBuffer anyway, so we have to fetch > it. > > The clock sweep is where all the time goes, in its current form. ...but I agree with this. In its current form, the clock sweep has to acquire a spinlock for every buffer it touches. That's really expensive, and I think we need to either get rid of it altogether or at least find some way to move it into the background. Ideally, in the common case, a backend that wants to allocate a buffer shouldn't need to do more than acquire a spinlock, pop a buffer off some sort of linked list of allocation targets, and release the spinlock. Whatever other computation is needed should get pushed out of foreground processing. > We have 2 problems > > 1. StrategySyncStart() requests exclusive lock on BufFreelistLock, so > bgwriter has to fight with backends to find out where it should clean. > As a result it spends lots of time waiting and insufficient time > cleaning. I have trouble accepting that this is really the problem we should be solving. There's only one background writer process, and that process is waking up every 200ms by default. Many people probably tune that down quite a bit, but unless I'm quite mistaken, it should still be a drop in the bucket next to what the backends are doing. > 2. When a backend can't find a free buffer, it spins for a long time > while holding the lock. This makes the buffer strategy O(N) in its > worst case, which slows everything down. Notably, while this is > happening the bgwriter sits doing nothing at all, right at the moment > when it is most needed. The Clock algorithm is an approximation of an > LRU, so is already suboptimal as a "perfect cache". Tweaking to avoid > worst case behaviour makes sense. How much to tweak? Well,... I generally agree with this analysis, but I don't think the proposed patch is going to solve the problem. It may have some merit as a way of limiting the worst case behavior. For example, if every shared buffer has a reference count of 5, the first buffer allocation that misses is going to have a lot of work to do before it can actually come up with a victim. But I don't think it's going to provide good scaling in general. Even if the background writer only spins through, on average, ten or fifteen buffers before finding one to evict, that still means we're acquiring ten or fifteen spinlocks while holding BufFreelistLock. I don't currently have the measurements to prove that's too expensive, but I bet it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: