Home > mailing lists

Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins
Date	June 22, 2014 18:11:51
Msg-id	20140622150913.GJ30721@alap3.anarazel.de Whole thread Raw
In response to	Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins (Simon Riggs <simon@2ndQuadrant.com>)
Responses	Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins (Simon Riggs <simon@2ndQuadrant.com>)
List	pgsql-hackers

Tree view

On 2014-06-22 12:38:04 +0100, Simon Riggs wrote:
> On 9 April 2014 15:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> >> On 2014-04-09 18:13:29 +0530, Pavan Deolasee wrote:
> >>> An orthogonal issue I noted is that we never check for overflow in the ref
> >>> count itself. While I understand overflowing int32 counter will take a
> >>> large number of pins on the same buffer, it can still happen in the worst
> >>> case, no ? Or is there a theoretical limit on the number of pins on the
> >>> same buffer by a single backend ?
> >
> >> I think we'll die much earlier, because the resource owner array keeping
> >> track of buffer pins will be larger than 1GB.
> >
> > The number of pins is bounded, more or less, by the number of scan nodes
> > in your query plan.  You'll have run out of memory trying to plan the
> > query, assuming you live that long.
> 
> ISTM that there is a strong possibility that the last buffer pinned
> will be the next buffer to be unpinned. We can use that to optimise
> this.

> If we store the last 8 buffers pinned in the fast array then we will
> be very likely to hit the right buffer just by scanning the array.
> 
> So if we treat the fast array as a circular LRU, we get
> * pinning a new buffer when array has an empty slot is O(1)
> * pinning a new buffer when array is full causes us to move the LRU
> into the hash table and then use that element
> * unpinning a buffer will most often be O(1), which then leaves an
> empty slot for next pin
> 
> Doing it that way means all usage is O(1) apart from when we use >8
> pins concurrently and that usage does not follow the regular pattern.

Even that case is O(1) in the average case since insertion into a
hashtable is O(1) on average...

I've started working on a patch that pretty much works like that. It
doesn't move things around in the array, because that seemed to perform
badly. That seems to make sense, because it'd require moving entries in
the relatively common case of two pages being pinned.
It moves one array entry (chosen by [someint++ % NUM_ENTRIES] and moves
it to the hashtable and puts the new item in the now free slot. Same
happens if a lookup hits an entry from the hashtable. It moves one
entry from the array into the hashtable and puts the entry from the
hashtable in the free slot.
That seems to work nicely, but needs some cleanup. And benchmarks.

> We need to do something about this. We have complaints (via Heikki)
> that we are using too much memory in idle backends and small configs,
> plus we know we are using too much memory in larger servers. Reducing
> the memory usage here will reduce CPU L2 cache churn as well as
> increasing available  RAM.

Yea, the buffer pin array currently is one of the biggest sources of
cache misses... In contrast to things like the buffer descriptors it's
not even shared between concurrent processes, so it's more wasteful,
even if small.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

pgsql-hackers by date:

From: Kevin Grittner
Date: 22 June 2014, 18:11:28
Subject: Re: idle_in_transaction_timeout

From: Andres Freund
Date: 22 June 2014, 18:34:00
Subject: Re: review: Built-in binning functions

Re: [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins - Mailing list pgsql-hackers

Previous

Next