Thread: StrategyGetBuffer questions
In this sprawling thread on scaling issues [1], the topic meandered into StrategyGetBuffer() -- in particular the clock sweep loop. I'm wondering: *) If there shouldn't be a a bound in terms of how many candidate buffers you're allowed to skip for having a non-zero usage count. Whenever an unpinned usage_count>0 buffer is found, trycounter is reset (!) so that the code operates from point of view as it had just entered the loop. There is an implicit assumption that this is rare, but how rare is it? *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds itself examining too many unpinned buffers per sweep? *) Since the purpose of usage_count is to act on advisory basis to keep recently/frequently accessed buffers from being discarded, is it really necessary to rigorously guard the count with a spinlock? If a ++ or -- operation on the value gets missed here or there, how big of a deal is it really? Right now the code is going: LockBufHdr(buf); if (buf->refcount == 0) { if (buf->usage_count > 0) { buf->usage_count--; trycounter = NBuffers; } else { /* Found a usablebuffer */ if (strategy != NULL) AddBufferToRing(strategy, buf); return buf; } } What if the spinlock was deferred to *after* the initial examination of the pin refcount. Naturally, if we decided the buffer was interesting we'd need to lock the header and retest refcount == 0 to make sure nobody got to the buffer before us. In the unlikely event it was set we'd immediately unlock the header and loop. That way we get to decrement usage_count and loop without having to have ever acquired a spinlock on the header. Point being: this would tighten the clock sweep loop considerably if it ever happens that there were a lot of buffers with usage_count > 0 to wade through before getting a candidate, at the cost of usage_count being 100% deterministic. Is that harmless though? If it was, or it could be reasonably be put in, would this positively impact cases that are bound by the free list lock? merlin http://postgresql.1045698.n5.nabble.com/High-SYS-CPU-need-advise-td5732045.html
On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > In this sprawling thread on scaling issues [1], the topic meandered > into StrategyGetBuffer() -- in particular the clock sweep loop. I'm > wondering: > > *) If there shouldn't be a a bound in terms of how many candidate > buffers you're allowed to skip for having a non-zero usage count. > Whenever an unpinned usage_count>0 buffer is found, trycounter is > reset (!) so that the code operates from point of view as it had just > entered the loop. There is an implicit assumption that this is rare, > but how rare is it? How often is that the trycounter would hit zero if it were not reset? I've instrumented something like that in the past, and could only get it to fire under pathologically small shared_buffers and workloads that caused most of them to be pinned simultaneously. > *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds > itself examining too many unpinned buffers per sweep? It is a self correcting problem. If it is examining a lot of unpinned buffers, it is also decrementing a lot of them. > > *) Since the purpose of usage_count is to act on advisory basis to > keep recently/frequently accessed buffers from being discarded, is it > really necessary to rigorously guard the count with a spinlock? If a > ++ or -- operation on the value gets missed here or there, how big of > a deal is it really? I don't think it is all that big of a deal. I've implemented this patch to do that. It still applies to head. http://archives.postgresql.org/pgsql-hackers/2011-08/msg00305.php It was very effective at removing BufFreelistLock contention on the system I had at the time. Cheers, Jeff
On Tue, Nov 20, 2012 at 4:50 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> In this sprawling thread on scaling issues [1], the topic meandered >> into StrategyGetBuffer() -- in particular the clock sweep loop. I'm >> wondering: >> >> *) If there shouldn't be a a bound in terms of how many candidate >> buffers you're allowed to skip for having a non-zero usage count. >> Whenever an unpinned usage_count>0 buffer is found, trycounter is >> reset (!) so that the code operates from point of view as it had just >> entered the loop. There is an implicit assumption that this is rare, >> but how rare is it? > > > How often is that the trycounter would hit zero if it were not reset? > I've instrumented something like that in the past, and could only get > it to fire under pathologically small shared_buffers and workloads > that caused most of them to be pinned simultaneously. > >> *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds >> itself examining too many unpinned buffers per sweep? > > It is a self correcting problem. If it is examining a lot of unpinned > buffers, it is also decrementing a lot of them. > >> >> *) Since the purpose of usage_count is to act on advisory basis to >> keep recently/frequently accessed buffers from being discarded, is it >> really necessary to rigorously guard the count with a spinlock? If a >> ++ or -- operation on the value gets missed here or there, how big of >> a deal is it really? > > I don't think it is all that big of a deal. > > I've implemented this patch to do that. It still applies to head. > > http://archives.postgresql.org/pgsql-hackers/2011-08/msg00305.php > > It was very effective at removing BufFreelistLock contention on the > system I had at the time. Ah, interesting. well, that's more aggressive in that you dispense with the lwlock completely. hm. merlin
On Wednesday, November 21, 2012 4:21 AM Jeff Janes wrote: > On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure <mmoncure@gmail.com> > wrote: > > In this sprawling thread on scaling issues [1], the topic meandered > > into StrategyGetBuffer() -- in particular the clock sweep loop. I'm > > wondering: > > > > > > *) Since the purpose of usage_count is to act on advisory basis to > > keep recently/frequently accessed buffers from being discarded, is it > > really necessary to rigorously guard the count with a spinlock? If a > > ++ or -- operation on the value gets missed here or there, how big of > > a deal is it really? > > I don't think it is all that big of a deal. > > I've implemented this patch to do that. It still applies to head. > > http://archives.postgresql.org/pgsql-hackers/2011-08/msg00305.php > > It was very effective at removing BufFreelistLock contention on the > system I had at the time. In that case, why don't we work towards reducing it? Is the generic use case a problem or will it effect any generic scenario in negative way? With Regards, Amit Kapila.
On Tue, Nov 20, 2012 at 4:50 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> In this sprawling thread on scaling issues [1], the topic meandered >> into StrategyGetBuffer() -- in particular the clock sweep loop. I'm >> wondering: >> >> *) If there shouldn't be a a bound in terms of how many candidate >> buffers you're allowed to skip for having a non-zero usage count. >> Whenever an unpinned usage_count>0 buffer is found, trycounter is >> reset (!) so that the code operates from point of view as it had just >> entered the loop. There is an implicit assumption that this is rare, >> but how rare is it? > > How often is that the trycounter would hit zero if it were not reset? > I've instrumented something like that in the past, and could only get > it to fire under pathologically small shared_buffers and workloads > that caused most of them to be pinned simultaneously. well, it's basically impossible -- and that's what I find odd. >> *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds >> itself examining too many unpinned buffers per sweep? > > It is a self correcting problem. If it is examining a lot of unpinned > buffers, it is also decrementing a lot of them. sure. but it's entirely plausible that some backends are marking up usage_count rapidly and not allocating buffers while others are doing a lot of allocations. point being: all it takes is one backend to get scheduled out while holding the freelist lock to effectively freeze the database for many operations. it's been documented [1] that particular buffers can become spinlock contention hot spots due to reference counting of the pins. if a lot of allocation is happening concurrently it's only a matter of time before the clock sweep rolls around to one of them, hits the spinlock, and (in the worst case) schedules out. this could in turn shut down the clock sweep for some time and non allocating backends might then beat on established buffers and pumping up usage counts. The reference counting problem might be alleviated in some fashion for example via Robert's idea to disable reference counting under contention [2]. Even if you do that. you're still in for a world of hurt if you get scheduled out of a buffer allocation. Your patch fixes that AFAICT. The buffer pin check is outside the wider lock, making my suggestion to be less rigorous about usage_count a lot less useful (but perhaps not completely useless!). Another innovation might be to implement a 'trylock' variant of LockBufHdr that does a TAS but doesn't spin -- if someone else has the header locked, why bother waiting for it? just skip to the next and move on. .. merlin [1] http://archives.postgresql.org/pgsql-hackers/2012-05/msg01557.php [2] http://archives.postgresql.org/pgsql-hackers/2012-05/msg01571.php
On Thursday, November 22, 2012 3:26 AM Merlin Moncure wrote: > On Tue, Nov 20, 2012 at 4:50 PM, Jeff Janes <jeff.janes@gmail.com> > wrote: > > On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure <mmoncure@gmail.com> > wrote: > >> In this sprawling thread on scaling issues [1], the topic meandered > >> into StrategyGetBuffer() -- in particular the clock sweep loop. I'm > >> wondering: > >> > >> *) If there shouldn't be a a bound in terms of how many candidate > >> buffers you're allowed to skip for having a non-zero usage count. > >> Whenever an unpinned usage_count>0 buffer is found, trycounter is > >> reset (!) so that the code operates from point of view as it had just > >> entered the loop. There is an implicit assumption that this is rare, > >> but how rare is it? > > > > How often is that the trycounter would hit zero if it were not reset? > > I've instrumented something like that in the past, and could only get > > it to fire under pathologically small shared_buffers and workloads > > that caused most of them to be pinned simultaneously. > > well, it's basically impossible -- and that's what I find odd. > > >> *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds > >> itself examining too many unpinned buffers per sweep? > > > > It is a self correcting problem. If it is examining a lot of unpinned > > buffers, it is also decrementing a lot of them. > > sure. but it's entirely plausible that some backends are marking up > usage_count rapidly and not allocating buffers while others are doing > a lot of allocations. point being: all it takes is one backend to get > scheduled out while holding the freelist lock to effectively freeze > the database for many operations. True, even this is observed by me, the case I have tried for this was that when 50% of buffers are always used for some hot table(table in high demand) and rest 50% of buffers are used for normal ops. In such cases contention can be observed for BufFreelistLock. The same can be seen in the Report I have attachedin below mail. http://archives.postgresql.org/pgsql-hackers/2012-11/msg01147.php > it's been documented [1] that particular buffers can become spinlock > contention hot spots due to reference counting of the pins. if a lot > of allocation is happening concurrently it's only a matter of time > before the clock sweep rolls around to one of them, hits the spinlock, > and (in the worst case) schedules out. this could in turn shut down > the clock sweep for some time and non allocating backends might then > beat on established buffers and pumping up usage counts. > > The reference counting problem might be alleviated in some fashion for > example via Robert's idea to disable reference counting under > contention [2]. Even if you do that. you're still in for a world of > hurt if you get scheduled out of a buffer allocation. Your patch > fixes that AFAICT. The buffer pin check is outside the wider lock, > making my suggestion to be less rigorous about usage_count a lot less > useful (but perhaps not completely useless!). > > Another innovation might be to implement a 'trylock' variant of > LockBufHdr that does a TAS but doesn't spin -- if someone else has the > header locked, why bother waiting for it? just skip to the next and > move on. .. I think this is reasonable idea. How about having Hot and Cold end in the buffer list, so that if some of the buffers are heavily used, then other backends might not need to pay penality for traversing Hot Buffers. With Regards, Amit Kapila.