Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date
Msg-id 20140622150913.GJ30721@alap3.anarazel.de
Whole thread Raw
In response to Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 2014-06-22 12:38:04 +0100, Simon Riggs wrote:
> On 9 April 2014 15:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> >> On 2014-04-09 18:13:29 +0530, Pavan Deolasee wrote:
> >>> An orthogonal issue I noted is that we never check for overflow in the ref
> >>> count itself. While I understand overflowing int32 counter will take a
> >>> large number of pins on the same buffer, it can still happen in the worst
> >>> case, no ? Or is there a theoretical limit on the number of pins on the
> >>> same buffer by a single backend ?
> >
> >> I think we'll die much earlier, because the resource owner array keeping
> >> track of buffer pins will be larger than 1GB.
> >
> > The number of pins is bounded, more or less, by the number of scan nodes
> > in your query plan.  You'll have run out of memory trying to plan the
> > query, assuming you live that long.
> 
> ISTM that there is a strong possibility that the last buffer pinned
> will be the next buffer to be unpinned. We can use that to optimise
> this.

> If we store the last 8 buffers pinned in the fast array then we will
> be very likely to hit the right buffer just by scanning the array.
> 
> So if we treat the fast array as a circular LRU, we get
> * pinning a new buffer when array has an empty slot is O(1)
> * pinning a new buffer when array is full causes us to move the LRU
> into the hash table and then use that element
> * unpinning a buffer will most often be O(1), which then leaves an
> empty slot for next pin
> 
> Doing it that way means all usage is O(1) apart from when we use >8
> pins concurrently and that usage does not follow the regular pattern.

Even that case is O(1) in the average case since insertion into a
hashtable is O(1) on average...

I've started working on a patch that pretty much works like that. It
doesn't move things around in the array, because that seemed to perform
badly. That seems to make sense, because it'd require moving entries in
the relatively common case of two pages being pinned.
It moves one array entry (chosen by [someint++ % NUM_ENTRIES] and moves
it to the hashtable and puts the new item in the now free slot. Same
happens if a lookup hits an entry from the hashtable. It moves one
entry from the array into the hashtable and puts the entry from the
hashtable in the free slot.
That seems to work nicely, but needs some cleanup. And benchmarks.

> We need to do something about this. We have complaints (via Heikki)
> that we are using too much memory in idle backends and small configs,
> plus we know we are using too much memory in larger servers. Reducing
> the memory usage here will reduce CPU L2 cache churn as well as
> increasing available  RAM.

Yea, the buffer pin array currently is one of the biggest sources of
cache misses... In contrast to things like the buffer descriptors it's
not even shared between concurrent processes, so it's more wasteful,
even if small.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: idle_in_transaction_timeout
Next
From: Andres Freund
Date:
Subject: Re: review: Built-in binning functions