Re: StrategyGetBuffer questions - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: StrategyGetBuffer questions
Date
Msg-id CAHyXU0wcd36SAJ7ChaKZym7_vPxN1z6qxBAzR7QP-WhW8VeM4A@mail.gmail.com
Whole thread Raw
In response to Re: StrategyGetBuffer questions  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: StrategyGetBuffer questions  (Amit Kapila <amit.kapila@huawei.com>)
List pgsql-hackers
On Tue, Nov 20, 2012 at 4:50 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> In this sprawling thread on scaling issues [1], the topic meandered
>> into StrategyGetBuffer() -- in particular the clock sweep loop.  I'm
>> wondering:
>>
>> *) If there shouldn't be a a bound in terms of how many candidate
>> buffers you're allowed to skip for having a non-zero usage count.
>> Whenever an unpinned usage_count>0 buffer is found, trycounter is
>> reset (!) so that the code operates from point of view as it had just
>> entered the loop.  There is an implicit assumption that this is rare,
>> but how rare is it?
>
> How often is that the trycounter would hit zero if it were not reset?
> I've instrumented something like that in the past, and could only get
> it to fire under pathologically small shared_buffers and workloads
> that caused most of them to be pinned simultaneously.

well, it's basically impossible -- and that's what I find odd.

>> *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds
>> itself examining too many unpinned buffers per sweep?
>
> It is a self correcting problem.  If it is examining a lot of unpinned
> buffers, it is also decrementing a lot of them.

sure.  but it's entirely plausible that some backends are marking up
usage_count rapidly and not allocating buffers while others are doing
a lot of allocations.  point being: all it takes is one backend to get
scheduled out while holding the freelist lock to effectively freeze
the database for many operations.

it's been documented [1] that particular buffers can become spinlock
contention hot spots due to reference counting of the pins.   if a lot
of allocation is happening concurrently it's only a matter of time
before the clock sweep rolls around to one of them, hits the spinlock,
and (in the worst case) schedules out.  this could in turn shut down
the clock sweep for some time and non allocating backends might then
beat on established buffers and pumping up usage counts.

The reference counting problem might be alleviated in some fashion for
example via Robert's idea to disable reference counting under
contention [2].  Even if you do that. you're still in for a world of
hurt if you get scheduled out of a buffer allocation.   Your patch
fixes that AFAICT.  The buffer pin check is outside the wider lock,
making my suggestion to be less rigorous about usage_count a lot less
useful (but perhaps not completely useless!).

Another innovation might be to implement a 'trylock' variant of
LockBufHdr that does a TAS but doesn't spin -- if someone else has the
header locked, why bother waiting for it? just skip to the next and
move on.  ..

merlin


[1] http://archives.postgresql.org/pgsql-hackers/2012-05/msg01557.php

[2] http://archives.postgresql.org/pgsql-hackers/2012-05/msg01571.php



pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: WIP patch for hint bit i/o mitigation
Next
From: Peter Eisentraut
Date:
Subject: Re: MySQL search query is not executing in Postgres DB