On 03.01.2012 17:56, Simon Riggs wrote:
> On Tue, Jan 3, 2012 at 3:18 PM, Robert Haas<robertmhaas@gmail.com> wrote:
>
>>> 2. When a backend can't find a free buffer, it spins for a long time
>>> while holding the lock. This makes the buffer strategy O(N) in its
>>> worst case, which slows everything down. Notably, while this is
>>> happening the bgwriter sits doing nothing at all, right at the moment
>>> when it is most needed. The Clock algorithm is an approximation of an
>>> LRU, so is already suboptimal as a "perfect cache". Tweaking to avoid
>>> worst case behaviour makes sense. How much to tweak? Well,...
>>
>> I generally agree with this analysis, but I don't think the proposed
>> patch is going to solve the problem. It may have some merit as a way
>> of limiting the worst case behavior. For example, if every shared
>> buffer has a reference count of 5, the first buffer allocation that
>> misses is going to have a lot of work to do before it can actually
>> come up with a victim. But I don't think it's going to provide good
>> scaling in general. Even if the background writer only spins through,
>> on average, ten or fifteen buffers before finding one to evict, that
>> still means we're acquiring ten or fifteen spinlocks while holding
>> BufFreelistLock. I don't currently have the measurements to prove
>> that's too expensive, but I bet it is.
>
> I think its worth reducing the cost of scanning, but that has little
> to do with solving the O(N) problem. I think we need both.
>
> I've left the way open for you to redesign freelist management in many
> ways. Please take the opportunity and go for it, though we must
> realise that major overhauls require significantly more testing to
> prove their worth. Reducing spinlocking only sounds like a good way to
> proceed for this release.
>
> If you don't have time in 9.2, then these two small patches are worth
> having. The bgwriter locking patch needs less formal evidence to show
> its worth. We simply don't need to have a bgwriter that just sits
> waiting doing nothing.
I'd like to see some benchmarks that show a benefit from these patches,
before committing something like this that complicates the code. These
patches are fairly small, but nevertheless. Once we have a test case, we
can argue whether the benefit we're seeing is worth the extra code, or
if there's some better way to achieve it.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com