Thread: PrivateRefCount (for 8.3)

PrivateRefCount (for 8.3)

From
"Simon Riggs"
Date:
In the bufmgr code we keep track of reference counts on individual
buffers. We do this with a local array that grows in proportion to the
size of shared_buffers.

When we have a high shared_buffers, together with a very large number of
connected users then this will waste considerable RAM that would
otherwise been available to OS buffers.

Each element of the array is an int32, so with 200 connected users and a
shared_buffers = 250,000 this would use 200 MB of RAM. For 1000
connected users and 500,000 buffers this would use 2 GB of RAM. This
would clearly be a problem when attempting to use larger shared_buffers
to increase performance, which some people believe is useful. IMHO we
should support those people who find such settings beneficial.

The number of pins held by any backend at once is mostly 1-2, but could
in some circumstances go as high as the number of tables in a join. So
the vast majority of the array would be unused, yet we require random
access to it. This will cause cache churning and some delays while we
request the needed parts of the array from main memory.

Under specific conditions, I propose to replace the array with a hash
table, designed with a custom hash function that would map the pins held
onto just 16 hash buckets. That is small enough to fit completely in
cache, yet large enough to avoid overhead of collisions in most cases.
Should the hash table overflow at any time, it would be replaced by the
standard array as a fallback (with all code as now).

The gain from the additional available memory and the local cache
efficiency will at some point outweigh the cost of indirect access to
private ref counts. The suggested conditions to use the hash table would
be when NBuffers > 100000 and max_connections > 200, but those are just
gut feelings. We can measure that point if the idea is acceptable.

Comments?

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: PrivateRefCount (for 8.3)

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> Under specific conditions, I propose to replace the array with a hash
> table, designed with a custom hash function that would map the pins held
> onto just 16 hash buckets.

> Comments?

Most likely a waste of development effort --- have you got any evidence
of a real effect here?  With 200 max_connections the size of the arrays
is still less than 10% of the space occupied by the buffers themselves,
ergo there isn't going to be all that much cache-thrashing compared to
what happens in the buffers themselves.  You're going to be hard pressed
to buy back the overhead of the hashing.

It might be interesting to see whether we could shrink the refcount
entries to int16 or int8.  We'd need some scheme to deal with overflow,
but given that the counts are now backed by ResourceOwner entries, maybe
extra state could be kept in those entries to handle it.
        regards, tom lane


Re: PrivateRefCount (for 8.3)

From
"Simon Riggs"
Date:
On Mon, 2006-11-27 at 13:44 -0500, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > Under specific conditions, I propose to replace the array with a hash
> > table, designed with a custom hash function that would map the pins held
> > onto just 16 hash buckets.
> 
> > Comments?
> 
> Most likely a waste of development effort --- have you got any evidence
> of a real effect here?  With 200 max_connections the size of the arrays
> is still less than 10% of the space occupied by the buffers themselves,
> ergo there isn't going to be all that much cache-thrashing compared to
> what happens in the buffers themselves.  You're going to be hard pressed
> to buy back the overhead of the hashing.

And at 2000 connections we waste RAM the size of shared_buffers... that
isn't something to easily ignore.

> It might be interesting to see whether we could shrink the refcount
> entries to int16 or int8.  We'd need some scheme to deal with overflow,
> but given that the counts are now backed by ResourceOwner entries, maybe
> extra state could be kept in those entries to handle it.

int8 still seems like overkjll. When will the ref counts go above 2 on a
regular basis? Surely refcount=2 is just chance at the best of times.

Refcount -> 2 bits per value, plus a simple overflow list? That would
allow 0,1,2 ref counts plus 3 means look in hashtable to find real
refcount.

I'll see what test results I can arrange.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: PrivateRefCount (for 8.3)

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> On Mon, 2006-11-27 at 13:44 -0500, Tom Lane wrote:
>> Most likely a waste of development effort --- have you got any evidence
>> of a real effect here?

> And at 2000 connections we waste RAM the size of shared_buffers... that
> isn't something to easily ignore.

It is until someone shows me that Postgres actually works productively
at 2000 connections.  I think this is probably very far down the list
of problems you'd hit with so many backends.
        regards, tom lane


Re: PrivateRefCount (for 8.3)

From
Bruce Momjian
Date:
Simon Riggs wrote:
> > Most likely a waste of development effort --- have you got any evidence
> > of a real effect here?  With 200 max_connections the size of the arrays
> > is still less than 10% of the space occupied by the buffers themselves,
> > ergo there isn't going to be all that much cache-thrashing compared to
> > what happens in the buffers themselves.  You're going to be hard pressed
> > to buy back the overhead of the hashing.
> 
> And at 2000 connections we waste RAM the size of shared_buffers... that
> isn't something to easily ignore.
> 
> > It might be interesting to see whether we could shrink the refcount
> > entries to int16 or int8.  We'd need some scheme to deal with overflow,
> > but given that the counts are now backed by ResourceOwner entries, maybe
> > extra state could be kept in those entries to handle it.
> 
> int8 still seems like overkjll. When will the ref counts go above 2 on a
> regular basis? Surely refcount=2 is just chance at the best of times.
> 
> Refcount -> 2 bits per value, plus a simple overflow list? That would
> allow 0,1,2 ref counts plus 3 means look in hashtable to find real
> refcount.

At two bits, would we run into contention for the byte by multiple
backends?

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: PrivateRefCount (for 8.3)

From
"Simon Riggs"
Date:
On Mon, 2006-11-27 at 14:42 -0500, Bruce Momjian wrote:
> Simon Riggs wrote:
> > int8 still seems like overkjll. When will the ref counts go above 2 on a
> > regular basis? Surely refcount=2 is just chance at the best of times.
> > 
> > Refcount -> 2 bits per value, plus a simple overflow list? That would
> > allow 0,1,2 ref counts plus 3 means look in hashtable to find real
> > refcount.
> 
> At two bits, would we run into contention for the byte by multiple
> backends?

No contention, its a private per-backend data structure. That's why we
want to reduce the size of it so badly.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: PrivateRefCount (for 8.3)

From
NikhilS
Date:
Hi,

Most likely a waste of development effort --- have you got any evidence
of a real effect here?  With 200 max_connections the size of the arrays
is still less than 10% of the space occupied by the buffers themselves,
ergo there isn't going to be all that much cache-thrashing compared to
what happens in the buffers themselves.  You're going to be hard pressed
to buy back the overhead of the hashing.

It might be interesting to see whether we could shrink the refcount
entries to int16 or int8.  We'd need some scheme to deal with overflow,
but given that the counts are now backed by ResourceOwner entries, maybe
extra state could be kept in those entries to handle it.

I did some instrumentation coupled with pgbench/dbt2/views/join query runs to find out the following:

(a) Maximum number of buffers pinned simultaneously by a backend:      6-9

(b) Maximum value of simultaneous pins on a given buffer by a backend: 4-6

(a) indicates that for large shared_buffers value we will end up with space wastage due to a big PrivateRefCount array per backend (current allocation is (int32 * shared_buffers)).

(b) indicates that the refcount to be tracked per buffer is a small enough value. And Tom's suggestion of exploring int16 or int8 might be worthwhile.

Following is the Hash Table based proposal based on the above readings:

- Do away with allocating NBuffers sized PrivateRefCount array which is
  an allocation of (NBuffers * int).
- Define Pvt_RefCnt_Size to be 64 (128?) or some such value so as to be multiples
  ahead of the above observed ranges. Define Overflow_Size to be 8 or some similar small value to handle collisions.

- Define the following Hash Table entry to keep track of reference counts

    struct HashRefCntEnt
    {
        int32        BufferId;
        int32        RefCnt;
        int32        NextEnt;    /* To handle collisions */
    };

-  Define a similar Overflow Table entry as above to handle collisions.

An array HashRefCntTable of such HashRefCntEnt'ries of size Pvt_RefCnt_Size will get
initialized in the InitBufferPoolAccess function.

An OverflowTable of size Overflow_Size will be allocated. This array will be sized dynamically (2* current Overflow_Size) to accomodate more entries if it cannot accomodate further collisions in the main table.

We do not want the overhead of a costly hashing function. So we will use
(%Pvt_RefCnt_Size i.e modulo Pvt_RefCnt_Size) to get the index where the buffer
needs to go. In short our hash function is (bufid % Pvt_RefCnt_Size) which should be a cheap enough operation.

Considering that 9-10 buffers will be needed, the probability of collisions
will be less. Collisions will arise only if buffers with ids (x, x +
Pvt_RefCnt_Size, x + 2*Pvt_RefCnt_Size etc.) get used in the same operation.
This should be pretty rare.

Functions PinBuffer, PinBuffer_Locked, IncrBufferRefCount, UnpinBuffer etc. will be modified to consider the above mechanism properly. The changes will be localized in the buf_init.c and bufmgr.c files only.

Comments please.

Regards,
Nikhils

--
EnterpriseDB               http://www.enterprisedb.com

Re: PrivateRefCount (for 8.3)

From
Bruce Momjian
Date:
Added to TODO:

* Consider decreasing the amount of memory used by PrivateRefCount
 http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php


---------------------------------------------------------------------------

Simon Riggs wrote:
> On Mon, 2006-11-27 at 14:42 -0500, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > int8 still seems like overkjll. When will the ref counts go above 2 on a
> > > regular basis? Surely refcount=2 is just chance at the best of times.
> > > 
> > > Refcount -> 2 bits per value, plus a simple overflow list? That would
> > > allow 0,1,2 ref counts plus 3 means look in hashtable to find real
> > > refcount.
> > 
> > At two bits, would we run into contention for the byte by multiple
> > backends?
> 
> No contention, its a private per-backend data structure. That's why we
> want to reduce the size of it so badly.
> 
> -- 
>   Simon Riggs             
>   EnterpriseDB   http://www.enterprisedb.com
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: PrivateRefCount (for 8.3)

From
NikhilS
Date:
Hi,

What is the opinion of the list as to the best way of measuring if the following implementation is ok?
 
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php

As mentioned in earlier mails, this will reduce the per-backend usage of memory by an amount which will be a fraction (single digit percentage) of (NBuffers * int) size . I have done pgbench/dbt2 runs and I do not see any negative impact because of this. Are there any other suggestions for measuring the backend memory footprint?

Regards,
Nikhils

On 2/21/07, Bruce Momjian <bruce@momjian.us> wrote:

Added to TODO:

* Consider decreasing the amount of memory used by PrivateRefCount

  http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php
  http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php


---------------------------------------------------------------------------

Simon Riggs wrote:
> On Mon, 2006-11-27 at 14:42 -0500, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > int8 still seems like overkjll. When will the ref counts go above 2 on a
> > > regular basis? Surely refcount=2 is just chance at the best of times.
> > >
> > > Refcount -> 2 bits per value, plus a simple overflow list? That would
> > > allow 0,1,2 ref counts plus 3 means look in hashtable to find real
> > > refcount.
> >
> > At two bits, would we run into contention for the byte by multiple
> > backends?
>
> No contention, its a private per-backend data structure. That's why we
> want to reduce the size of it so badly.
>
> --
>   Simon Riggs
>   EnterpriseDB   http://www.enterprisedb.com
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster



--
EnterpriseDB               http://www.enterprisedb.com

Re: PrivateRefCount (for 8.3)

From
Tom Lane
Date:
NikhilS <nikkhils@gmail.com> writes:
> What is the opinion of the list as to the best way of measuring if the
> following implementation is ok?
> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php
> As mentioned in earlier mails, this will reduce the per-backend usage of
> memory by an amount which will be a fraction (single digit percentage)
> of (NBuffers
> * int) size. I have done pgbench/dbt2 runs and I do not see any negative
> impact because of this.

I find it extremely telling that you don't claim to have seen any
positive impact either.

I think that the original argument
http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php
is basically bogus.  At 500000 buffers (4GB in shared memory) the
per-backend space for PrivateRefCount is still only 2MB, which is
simply not as significant as Simon claims; a backend needs at least
that much for catalog caches etc.  There is, furthermore, no evidence
that running shared_buffers that high is a good idea in the first
place, or that there aren't other performance bottlenecks that will
manifest before this one becomes interesting.

My inclination is to keep it simple until there's some real evidence
of a problem and benefit.
        regards, tom lane


Re: PrivateRefCount (for 8.3)

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> NikhilS <nikkhils@gmail.com> writes:
>> What is the opinion of the list as to the best way of measuring if the
>> following implementation is ok?
>> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php
>> As mentioned in earlier mails, this will reduce the per-backend usage of
>> memory by an amount which will be a fraction (single digit percentage)
>> of (NBuffers
>> * int) size. I have done pgbench/dbt2 runs and I do not see any negative
>> impact because of this.
> 
> I find it extremely telling that you don't claim to have seen any
> positive impact either.
> 
> I think that the original argument
> http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php
> is basically bogus.  At 500000 buffers (4GB in shared memory) the
> per-backend space for PrivateRefCount is still only 2MB, which is
> simply not as significant as Simon claims; a backend needs at least
> that much for catalog caches etc.  There is, furthermore, no evidence
> that running shared_buffers that high is a good idea in the first
> place, or that there aren't other performance bottlenecks that will
> manifest before this one becomes interesting.

hmm - we are continuily running into people with dedicated servers that
have 16GB RAM or even more available and most tuning docs recommend some20-30% of system RAM to get dedicated to
shared_buffers.So having some
 
500k buffers allocated does not sound so unrealistic in practise and
combined with the fact that people often have a few hundred backends
that could add up to some noticable overhead.
If that is actually a problem given that those people tend to have heaps
of memory is another story but if we can preserve some memory ...


Stefan


Re: PrivateRefCount (for 8.3)

From
NikhilS
Date:
Hi,

Yeah with 500K shared buffers and multiples of backends, we could achieve noticeable savings with this. And that is why it will be difficult to show the performance gains by running just pgbench/dbt2 on medium scale machines.
One way of looking at this could be that memory saved here, could lead to more critical usage elsewhere... But agreed, it is hard to show with just some performance runs.

Regards,
Nikhils

On 3/5/07, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:
Tom Lane wrote:
> NikhilS <nikkhils@gmail.com> writes:
>> What is the opinion of the list as to the best way of measuring if the
>> following implementation is ok?
>> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php
>> As mentioned in earlier mails, this will reduce the per-backend usage of
>> memory by an amount which will be a fraction (single digit percentage)
>> of (NBuffers
>> * int) size. I have done pgbench/dbt2 runs and I do not see any negative
>> impact because of this.
>
> I find it extremely telling that you don't claim to have seen any
> positive impact either.
>
> I think that the original argument
> http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php
> is basically bogus.  At 500000 buffers (4GB in shared memory) the
> per-backend space for PrivateRefCount is still only 2MB, which is
> simply not as significant as Simon claims; a backend needs at least
> that much for catalog caches etc.  There is, furthermore, no evidence
> that running shared_buffers that high is a good idea in the first
> place, or that there aren't other performance bottlenecks that will
> manifest before this one becomes interesting.

hmm - we are continuily running into people with dedicated servers that
have 16GB RAM or even more available and most tuning docs recommend some
20-30% of system RAM to get dedicated to shared_buffers. So having some
500k buffers allocated does not sound so unrealistic in practise and
combined with the fact that people often have a few hundred backends
that could add up to some noticable overhead.
If that is actually a problem given that those people tend to have heaps
of memory is another story but if we can preserve some memory ...


Stefan



--
EnterpriseDB               http://www.enterprisedb.com