Thread: StrategyGetBuffer questions

StrategyGetBuffer questions

From
Merlin Moncure
Date:
In this sprawling thread on scaling issues [1], the topic meandered
into StrategyGetBuffer() -- in particular the clock sweep loop.  I'm
wondering:

*) If there shouldn't be a a bound in terms of how many candidate
buffers you're allowed to skip for having a non-zero usage count.
Whenever an unpinned usage_count>0 buffer is found, trycounter is
reset (!) so that the code operates from point of view as it had just
entered the loop.  There is an implicit assumption that this is rare,
but how rare is it?

*) Shouldn't StrategyGetBuffer() bias down usage_count if it finds
itself examining too many unpinned buffers per sweep?

*) Since the purpose of usage_count is to act on advisory basis to
keep recently/frequently accessed buffers from being discarded, is it
really necessary to rigorously guard the count with a spinlock?  If a
++ or -- operation on the value gets missed here or there, how big of
a deal is it really?

Right now the code is going:    LockBufHdr(buf);    if (buf->refcount == 0)    {        if (buf->usage_count > 0)
{            buf->usage_count--;            trycounter = NBuffers;        }        else        {            /* Found a
usablebuffer */            if (strategy != NULL)                AddBufferToRing(strategy, buf);            return buf;
     }    }
 

What if the spinlock was deferred to *after* the initial examination
of the pin refcount.  Naturally, if we decided the buffer was
interesting we'd need to lock the header and retest refcount == 0 to
make sure nobody got to the buffer before us.  In the unlikely event
it was set we'd immediately unlock the header and loop.

That way we get to decrement usage_count and loop without having to
have ever acquired a spinlock on the header.  Point being: this would
tighten the clock sweep loop considerably if it ever happens that
there were a lot of buffers with usage_count > 0 to wade through
before getting a candidate, at the cost of usage_count being 100%
deterministic. Is that harmless though?  If it was, or it could be
reasonably be put in, would this positively impact cases that are
bound by the free list lock?

merlin

http://postgresql.1045698.n5.nabble.com/High-SYS-CPU-need-advise-td5732045.html



Re: StrategyGetBuffer questions

From
Jeff Janes
Date:
On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> In this sprawling thread on scaling issues [1], the topic meandered
> into StrategyGetBuffer() -- in particular the clock sweep loop.  I'm
> wondering:
>
> *) If there shouldn't be a a bound in terms of how many candidate
> buffers you're allowed to skip for having a non-zero usage count.
> Whenever an unpinned usage_count>0 buffer is found, trycounter is
> reset (!) so that the code operates from point of view as it had just
> entered the loop.  There is an implicit assumption that this is rare,
> but how rare is it?


How often is that the trycounter would hit zero if it were not reset?
I've instrumented something like that in the past, and could only get
it to fire under pathologically small shared_buffers and workloads
that caused most of them to be pinned simultaneously.

> *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds
> itself examining too many unpinned buffers per sweep?

It is a self correcting problem.  If it is examining a lot of unpinned
buffers, it is also decrementing a lot of them.

>
> *) Since the purpose of usage_count is to act on advisory basis to
> keep recently/frequently accessed buffers from being discarded, is it
> really necessary to rigorously guard the count with a spinlock?  If a
> ++ or -- operation on the value gets missed here or there, how big of
> a deal is it really?

I don't think it is all that big of a deal.

I've implemented this patch to do that.  It still applies to head.

http://archives.postgresql.org/pgsql-hackers/2011-08/msg00305.php

It was very effective at removing BufFreelistLock contention on the
system I had at the time.

Cheers,

Jeff



Re: StrategyGetBuffer questions

From
Merlin Moncure
Date:
On Tue, Nov 20, 2012 at 4:50 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> In this sprawling thread on scaling issues [1], the topic meandered
>> into StrategyGetBuffer() -- in particular the clock sweep loop.  I'm
>> wondering:
>>
>> *) If there shouldn't be a a bound in terms of how many candidate
>> buffers you're allowed to skip for having a non-zero usage count.
>> Whenever an unpinned usage_count>0 buffer is found, trycounter is
>> reset (!) so that the code operates from point of view as it had just
>> entered the loop.  There is an implicit assumption that this is rare,
>> but how rare is it?
>
>
> How often is that the trycounter would hit zero if it were not reset?
> I've instrumented something like that in the past, and could only get
> it to fire under pathologically small shared_buffers and workloads
> that caused most of them to be pinned simultaneously.
>
>> *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds
>> itself examining too many unpinned buffers per sweep?
>
> It is a self correcting problem.  If it is examining a lot of unpinned
> buffers, it is also decrementing a lot of them.
>
>>
>> *) Since the purpose of usage_count is to act on advisory basis to
>> keep recently/frequently accessed buffers from being discarded, is it
>> really necessary to rigorously guard the count with a spinlock?  If a
>> ++ or -- operation on the value gets missed here or there, how big of
>> a deal is it really?
>
> I don't think it is all that big of a deal.
>
> I've implemented this patch to do that.  It still applies to head.
>
> http://archives.postgresql.org/pgsql-hackers/2011-08/msg00305.php
>
> It was very effective at removing BufFreelistLock contention on the
> system I had at the time.

Ah, interesting. well, that's more aggressive in that you dispense
with the lwlock completely.  hm.

merlin



Re: StrategyGetBuffer questions

From
Amit Kapila
Date:
On Wednesday, November 21, 2012 4:21 AM Jeff Janes wrote:
> On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure <mmoncure@gmail.com>
> wrote:
> > In this sprawling thread on scaling issues [1], the topic meandered
> > into StrategyGetBuffer() -- in particular the clock sweep loop.  I'm
> > wondering:
> >


> >
> > *) Since the purpose of usage_count is to act on advisory basis to
> > keep recently/frequently accessed buffers from being discarded, is it
> > really necessary to rigorously guard the count with a spinlock?  If a
> > ++ or -- operation on the value gets missed here or there, how big of
> > a deal is it really?
> 
> I don't think it is all that big of a deal.
> 
> I've implemented this patch to do that.  It still applies to head.
> 
> http://archives.postgresql.org/pgsql-hackers/2011-08/msg00305.php
> 
> It was very effective at removing BufFreelistLock contention on the
> system I had at the time.

In that case, why don't we work towards reducing it?
Is the generic use case a problem or will it effect any generic scenario in
negative way?

With Regards,
Amit Kapila.





Re: StrategyGetBuffer questions

From
Merlin Moncure
Date:
On Tue, Nov 20, 2012 at 4:50 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> In this sprawling thread on scaling issues [1], the topic meandered
>> into StrategyGetBuffer() -- in particular the clock sweep loop.  I'm
>> wondering:
>>
>> *) If there shouldn't be a a bound in terms of how many candidate
>> buffers you're allowed to skip for having a non-zero usage count.
>> Whenever an unpinned usage_count>0 buffer is found, trycounter is
>> reset (!) so that the code operates from point of view as it had just
>> entered the loop.  There is an implicit assumption that this is rare,
>> but how rare is it?
>
> How often is that the trycounter would hit zero if it were not reset?
> I've instrumented something like that in the past, and could only get
> it to fire under pathologically small shared_buffers and workloads
> that caused most of them to be pinned simultaneously.

well, it's basically impossible -- and that's what I find odd.

>> *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds
>> itself examining too many unpinned buffers per sweep?
>
> It is a self correcting problem.  If it is examining a lot of unpinned
> buffers, it is also decrementing a lot of them.

sure.  but it's entirely plausible that some backends are marking up
usage_count rapidly and not allocating buffers while others are doing
a lot of allocations.  point being: all it takes is one backend to get
scheduled out while holding the freelist lock to effectively freeze
the database for many operations.

it's been documented [1] that particular buffers can become spinlock
contention hot spots due to reference counting of the pins.   if a lot
of allocation is happening concurrently it's only a matter of time
before the clock sweep rolls around to one of them, hits the spinlock,
and (in the worst case) schedules out.  this could in turn shut down
the clock sweep for some time and non allocating backends might then
beat on established buffers and pumping up usage counts.

The reference counting problem might be alleviated in some fashion for
example via Robert's idea to disable reference counting under
contention [2].  Even if you do that. you're still in for a world of
hurt if you get scheduled out of a buffer allocation.   Your patch
fixes that AFAICT.  The buffer pin check is outside the wider lock,
making my suggestion to be less rigorous about usage_count a lot less
useful (but perhaps not completely useless!).

Another innovation might be to implement a 'trylock' variant of
LockBufHdr that does a TAS but doesn't spin -- if someone else has the
header locked, why bother waiting for it? just skip to the next and
move on.  ..

merlin


[1] http://archives.postgresql.org/pgsql-hackers/2012-05/msg01557.php

[2] http://archives.postgresql.org/pgsql-hackers/2012-05/msg01571.php



Re: StrategyGetBuffer questions

From
Amit Kapila
Date:
On Thursday, November 22, 2012 3:26 AM Merlin Moncure wrote:
> On Tue, Nov 20, 2012 at 4:50 PM, Jeff Janes <jeff.janes@gmail.com>
> wrote:
> > On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure <mmoncure@gmail.com>
> wrote:
> >> In this sprawling thread on scaling issues [1], the topic meandered
> >> into StrategyGetBuffer() -- in particular the clock sweep loop.  I'm
> >> wondering:
> >>
> >> *) If there shouldn't be a a bound in terms of how many candidate
> >> buffers you're allowed to skip for having a non-zero usage count.
> >> Whenever an unpinned usage_count>0 buffer is found, trycounter is
> >> reset (!) so that the code operates from point of view as it had just
> >> entered the loop.  There is an implicit assumption that this is rare,
> >> but how rare is it?
> >
> > How often is that the trycounter would hit zero if it were not reset?
> > I've instrumented something like that in the past, and could only get
> > it to fire under pathologically small shared_buffers and workloads
> > that caused most of them to be pinned simultaneously.
> 
> well, it's basically impossible -- and that's what I find odd.
> 
> >> *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds
> >> itself examining too many unpinned buffers per sweep?
> >
> > It is a self correcting problem.  If it is examining a lot of unpinned
> > buffers, it is also decrementing a lot of them.
> 
> sure.  but it's entirely plausible that some backends are marking up
> usage_count rapidly and not allocating buffers while others are doing
> a lot of allocations.  point being: all it takes is one backend to get
> scheduled out while holding the freelist lock to effectively freeze
> the database for many operations.
 True, even this is observed by me, the case I have tried for this was that
when 50% of buffers are always used for some  hot table(table in high demand) and rest 50% of buffers are used for
normal ops. In such cases contention can be observed  for BufFreelistLock. The same can be seen in the Report I have
attachedin
 
below mail. http://archives.postgresql.org/pgsql-hackers/2012-11/msg01147.php

> it's been documented [1] that particular buffers can become spinlock
> contention hot spots due to reference counting of the pins.   if a lot
> of allocation is happening concurrently it's only a matter of time
> before the clock sweep rolls around to one of them, hits the spinlock,
> and (in the worst case) schedules out.  this could in turn shut down
> the clock sweep for some time and non allocating backends might then
> beat on established buffers and pumping up usage counts.
> 
> The reference counting problem might be alleviated in some fashion for
> example via Robert's idea to disable reference counting under
> contention [2].  Even if you do that. you're still in for a world of
> hurt if you get scheduled out of a buffer allocation.   Your patch
> fixes that AFAICT.  The buffer pin check is outside the wider lock,
> making my suggestion to be less rigorous about usage_count a lot less
> useful (but perhaps not completely useless!).
> 
> Another innovation might be to implement a 'trylock' variant of
> LockBufHdr that does a TAS but doesn't spin -- if someone else has the
> header locked, why bother waiting for it? just skip to the next and
> move on.  ..

I think this is reasonable idea. 
How about having Hot and Cold end in the buffer list, so that if some of the
buffers are heavily
used, then other backends might not need to pay penality for traversing Hot
Buffers.

With Regards,
Amit Kapila.