Thread: PrivateRefCount (for 8.3)
In the bufmgr code we keep track of reference counts on individual buffers. We do this with a local array that grows in proportion to the size of shared_buffers. When we have a high shared_buffers, together with a very large number of connected users then this will waste considerable RAM that would otherwise been available to OS buffers. Each element of the array is an int32, so with 200 connected users and a shared_buffers = 250,000 this would use 200 MB of RAM. For 1000 connected users and 500,000 buffers this would use 2 GB of RAM. This would clearly be a problem when attempting to use larger shared_buffers to increase performance, which some people believe is useful. IMHO we should support those people who find such settings beneficial. The number of pins held by any backend at once is mostly 1-2, but could in some circumstances go as high as the number of tables in a join. So the vast majority of the array would be unused, yet we require random access to it. This will cause cache churning and some delays while we request the needed parts of the array from main memory. Under specific conditions, I propose to replace the array with a hash table, designed with a custom hash function that would map the pins held onto just 16 hash buckets. That is small enough to fit completely in cache, yet large enough to avoid overhead of collisions in most cases. Should the hash table overflow at any time, it would be replaced by the standard array as a fallback (with all code as now). The gain from the additional available memory and the local cache efficiency will at some point outweigh the cost of indirect access to private ref counts. The suggested conditions to use the hash table would be when NBuffers > 100000 and max_connections > 200, but those are just gut feelings. We can measure that point if the idea is acceptable. Comments? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > Under specific conditions, I propose to replace the array with a hash > table, designed with a custom hash function that would map the pins held > onto just 16 hash buckets. > Comments? Most likely a waste of development effort --- have you got any evidence of a real effect here? With 200 max_connections the size of the arrays is still less than 10% of the space occupied by the buffers themselves, ergo there isn't going to be all that much cache-thrashing compared to what happens in the buffers themselves. You're going to be hard pressed to buy back the overhead of the hashing. It might be interesting to see whether we could shrink the refcount entries to int16 or int8. We'd need some scheme to deal with overflow, but given that the counts are now backed by ResourceOwner entries, maybe extra state could be kept in those entries to handle it. regards, tom lane
On Mon, 2006-11-27 at 13:44 -0500, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > Under specific conditions, I propose to replace the array with a hash > > table, designed with a custom hash function that would map the pins held > > onto just 16 hash buckets. > > > Comments? > > Most likely a waste of development effort --- have you got any evidence > of a real effect here? With 200 max_connections the size of the arrays > is still less than 10% of the space occupied by the buffers themselves, > ergo there isn't going to be all that much cache-thrashing compared to > what happens in the buffers themselves. You're going to be hard pressed > to buy back the overhead of the hashing. And at 2000 connections we waste RAM the size of shared_buffers... that isn't something to easily ignore. > It might be interesting to see whether we could shrink the refcount > entries to int16 or int8. We'd need some scheme to deal with overflow, > but given that the counts are now backed by ResourceOwner entries, maybe > extra state could be kept in those entries to handle it. int8 still seems like overkjll. When will the ref counts go above 2 on a regular basis? Surely refcount=2 is just chance at the best of times. Refcount -> 2 bits per value, plus a simple overflow list? That would allow 0,1,2 ref counts plus 3 means look in hashtable to find real refcount. I'll see what test results I can arrange. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > On Mon, 2006-11-27 at 13:44 -0500, Tom Lane wrote: >> Most likely a waste of development effort --- have you got any evidence >> of a real effect here? > And at 2000 connections we waste RAM the size of shared_buffers... that > isn't something to easily ignore. It is until someone shows me that Postgres actually works productively at 2000 connections. I think this is probably very far down the list of problems you'd hit with so many backends. regards, tom lane
Simon Riggs wrote: > > Most likely a waste of development effort --- have you got any evidence > > of a real effect here? With 200 max_connections the size of the arrays > > is still less than 10% of the space occupied by the buffers themselves, > > ergo there isn't going to be all that much cache-thrashing compared to > > what happens in the buffers themselves. You're going to be hard pressed > > to buy back the overhead of the hashing. > > And at 2000 connections we waste RAM the size of shared_buffers... that > isn't something to easily ignore. > > > It might be interesting to see whether we could shrink the refcount > > entries to int16 or int8. We'd need some scheme to deal with overflow, > > but given that the counts are now backed by ResourceOwner entries, maybe > > extra state could be kept in those entries to handle it. > > int8 still seems like overkjll. When will the ref counts go above 2 on a > regular basis? Surely refcount=2 is just chance at the best of times. > > Refcount -> 2 bits per value, plus a simple overflow list? That would > allow 0,1,2 ref counts plus 3 means look in hashtable to find real > refcount. At two bits, would we run into contention for the byte by multiple backends? -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Mon, 2006-11-27 at 14:42 -0500, Bruce Momjian wrote: > Simon Riggs wrote: > > int8 still seems like overkjll. When will the ref counts go above 2 on a > > regular basis? Surely refcount=2 is just chance at the best of times. > > > > Refcount -> 2 bits per value, plus a simple overflow list? That would > > allow 0,1,2 ref counts plus 3 means look in hashtable to find real > > refcount. > > At two bits, would we run into contention for the byte by multiple > backends? No contention, its a private per-backend data structure. That's why we want to reduce the size of it so badly. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Hi,
I did some instrumentation coupled with pgbench/dbt2/views/join query runs to find out the following:
(a) Maximum number of buffers pinned simultaneously by a backend: 6-9
(b) Maximum value of simultaneous pins on a given buffer by a backend: 4-6
(a) indicates that for large shared_buffers value we will end up with space wastage due to a big PrivateRefCount array per backend (current allocation is (int32 * shared_buffers)).
(b) indicates that the refcount to be tracked per buffer is a small enough value. And Tom's suggestion of exploring int16 or int8 might be worthwhile.
Following is the Hash Table based proposal based on the above readings:
- Do away with allocating NBuffers sized PrivateRefCount array which is
an allocation of (NBuffers * int).
- Define Pvt_RefCnt_Size to be 64 (128?) or some such value so as to be multiples
ahead of the above observed ranges. Define Overflow_Size to be 8 or some similar small value to handle collisions.
- Define the following Hash Table entry to keep track of reference counts
struct HashRefCntEnt
{
int32 BufferId;
int32 RefCnt;
int32 NextEnt; /* To handle collisions */
};
- Define a similar Overflow Table entry as above to handle collisions.
An array HashRefCntTable of such HashRefCntEnt'ries of size Pvt_RefCnt_Size will get
initialized in the InitBufferPoolAccess function.
An OverflowTable of size Overflow_Size will be allocated. This array will be sized dynamically (2* current Overflow_Size) to accomodate more entries if it cannot accomodate further collisions in the main table.
We do not want the overhead of a costly hashing function. So we will use
(%Pvt_RefCnt_Size i.e modulo Pvt_RefCnt_Size) to get the index where the buffer
needs to go. In short our hash function is (bufid % Pvt_RefCnt_Size) which should be a cheap enough operation.
Considering that 9-10 buffers will be needed, the probability of collisions
will be less. Collisions will arise only if buffers with ids (x, x +
Pvt_RefCnt_Size, x + 2*Pvt_RefCnt_Size etc.) get used in the same operation.
This should be pretty rare.
Functions PinBuffer, PinBuffer_Locked, IncrBufferRefCount, UnpinBuffer etc. will be modified to consider the above mechanism properly. The changes will be localized in the buf_init.c and bufmgr.c files only.
Comments please.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Most likely a waste of development effort --- have you got any evidence
of a real effect here? With 200 max_connections the size of the arrays
is still less than 10% of the space occupied by the buffers themselves,
ergo there isn't going to be all that much cache-thrashing compared to
what happens in the buffers themselves. You're going to be hard pressed
to buy back the overhead of the hashing.
It might be interesting to see whether we could shrink the refcount
entries to int16 or int8. We'd need some scheme to deal with overflow,
but given that the counts are now backed by ResourceOwner entries, maybe
extra state could be kept in those entries to handle it.
I did some instrumentation coupled with pgbench/dbt2/views/join query runs to find out the following:
(a) Maximum number of buffers pinned simultaneously by a backend: 6-9
(b) Maximum value of simultaneous pins on a given buffer by a backend: 4-6
(a) indicates that for large shared_buffers value we will end up with space wastage due to a big PrivateRefCount array per backend (current allocation is (int32 * shared_buffers)).
(b) indicates that the refcount to be tracked per buffer is a small enough value. And Tom's suggestion of exploring int16 or int8 might be worthwhile.
Following is the Hash Table based proposal based on the above readings:
- Do away with allocating NBuffers sized PrivateRefCount array which is
an allocation of (NBuffers * int).
- Define Pvt_RefCnt_Size to be 64 (128?) or some such value so as to be multiples
ahead of the above observed ranges. Define Overflow_Size to be 8 or some similar small value to handle collisions.
- Define the following Hash Table entry to keep track of reference counts
struct HashRefCntEnt
{
int32 BufferId;
int32 RefCnt;
int32 NextEnt; /* To handle collisions */
};
- Define a similar Overflow Table entry as above to handle collisions.
An array HashRefCntTable of such HashRefCntEnt'ries of size Pvt_RefCnt_Size will get
initialized in the InitBufferPoolAccess function.
An OverflowTable of size Overflow_Size will be allocated. This array will be sized dynamically (2* current Overflow_Size) to accomodate more entries if it cannot accomodate further collisions in the main table.
We do not want the overhead of a costly hashing function. So we will use
(%Pvt_RefCnt_Size i.e modulo Pvt_RefCnt_Size) to get the index where the buffer
needs to go. In short our hash function is (bufid % Pvt_RefCnt_Size) which should be a cheap enough operation.
Considering that 9-10 buffers will be needed, the probability of collisions
will be less. Collisions will arise only if buffers with ids (x, x +
Pvt_RefCnt_Size, x + 2*Pvt_RefCnt_Size etc.) get used in the same operation.
This should be pretty rare.
Functions PinBuffer, PinBuffer_Locked, IncrBufferRefCount, UnpinBuffer etc. will be modified to consider the above mechanism properly. The changes will be localized in the buf_init.c and bufmgr.c files only.
Comments please.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Added to TODO: * Consider decreasing the amount of memory used by PrivateRefCount http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php --------------------------------------------------------------------------- Simon Riggs wrote: > On Mon, 2006-11-27 at 14:42 -0500, Bruce Momjian wrote: > > Simon Riggs wrote: > > > int8 still seems like overkjll. When will the ref counts go above 2 on a > > > regular basis? Surely refcount=2 is just chance at the best of times. > > > > > > Refcount -> 2 bits per value, plus a simple overflow list? That would > > > allow 0,1,2 ref counts plus 3 means look in hashtable to find real > > > refcount. > > > > At two bits, would we run into contention for the byte by multiple > > backends? > > No contention, its a private per-backend data structure. That's why we > want to reduce the size of it so badly. > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hi,
What is the opinion of the list as to the best way of measuring if the following implementation is ok?
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php
As mentioned in earlier mails, this will reduce the per-backend usage of memory by an amount which will be a fraction (single digit percentage) of (NBuffers * int) size . I have done pgbench/dbt2 runs and I do not see any negative impact because of this. Are there any other suggestions for measuring the backend memory footprint?
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
What is the opinion of the list as to the best way of measuring if the following implementation is ok?
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php
As mentioned in earlier mails, this will reduce the per-backend usage of memory by an amount which will be a fraction (single digit percentage) of (NBuffers * int) size . I have done pgbench/dbt2 runs and I do not see any negative impact because of this. Are there any other suggestions for measuring the backend memory footprint?
Regards,
Nikhils
On 2/21/07, Bruce Momjian <bruce@momjian.us> wrote:
Added to TODO:
* Consider decreasing the amount of memory used by PrivateRefCount
http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php
---------------------------------------------------------------------------
Simon Riggs wrote:
> On Mon, 2006-11-27 at 14:42 -0500, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > int8 still seems like overkjll. When will the ref counts go above 2 on a
> > > regular basis? Surely refcount=2 is just chance at the best of times.
> > >
> > > Refcount -> 2 bits per value, plus a simple overflow list? That would
> > > allow 0,1,2 ref counts plus 3 means look in hashtable to find real
> > > refcount.
> >
> > At two bits, would we run into contention for the byte by multiple
> > backends?
>
> No contention, its a private per-backend data structure. That's why we
> want to reduce the size of it so badly.
>
> --
> Simon Riggs
> EnterpriseDB http://www.enterprisedb.com
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
--
EnterpriseDB http://www.enterprisedb.com
NikhilS <nikkhils@gmail.com> writes: > What is the opinion of the list as to the best way of measuring if the > following implementation is ok? > http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php > As mentioned in earlier mails, this will reduce the per-backend usage of > memory by an amount which will be a fraction (single digit percentage) > of (NBuffers > * int) size. I have done pgbench/dbt2 runs and I do not see any negative > impact because of this. I find it extremely telling that you don't claim to have seen any positive impact either. I think that the original argument http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php is basically bogus. At 500000 buffers (4GB in shared memory) the per-backend space for PrivateRefCount is still only 2MB, which is simply not as significant as Simon claims; a backend needs at least that much for catalog caches etc. There is, furthermore, no evidence that running shared_buffers that high is a good idea in the first place, or that there aren't other performance bottlenecks that will manifest before this one becomes interesting. My inclination is to keep it simple until there's some real evidence of a problem and benefit. regards, tom lane
Tom Lane wrote: > NikhilS <nikkhils@gmail.com> writes: >> What is the opinion of the list as to the best way of measuring if the >> following implementation is ok? >> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php >> As mentioned in earlier mails, this will reduce the per-backend usage of >> memory by an amount which will be a fraction (single digit percentage) >> of (NBuffers >> * int) size. I have done pgbench/dbt2 runs and I do not see any negative >> impact because of this. > > I find it extremely telling that you don't claim to have seen any > positive impact either. > > I think that the original argument > http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php > is basically bogus. At 500000 buffers (4GB in shared memory) the > per-backend space for PrivateRefCount is still only 2MB, which is > simply not as significant as Simon claims; a backend needs at least > that much for catalog caches etc. There is, furthermore, no evidence > that running shared_buffers that high is a good idea in the first > place, or that there aren't other performance bottlenecks that will > manifest before this one becomes interesting. hmm - we are continuily running into people with dedicated servers that have 16GB RAM or even more available and most tuning docs recommend some20-30% of system RAM to get dedicated to shared_buffers.So having some 500k buffers allocated does not sound so unrealistic in practise and combined with the fact that people often have a few hundred backends that could add up to some noticable overhead. If that is actually a problem given that those people tend to have heaps of memory is another story but if we can preserve some memory ... Stefan
Hi,
Yeah with 500K shared buffers and multiples of backends, we could achieve noticeable savings with this. And that is why it will be difficult to show the performance gains by running just pgbench/dbt2 on medium scale machines.
One way of looking at this could be that memory saved here, could lead to more critical usage elsewhere... But agreed, it is hard to show with just some performance runs.
Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Yeah with 500K shared buffers and multiples of backends, we could achieve noticeable savings with this. And that is why it will be difficult to show the performance gains by running just pgbench/dbt2 on medium scale machines.
One way of looking at this could be that memory saved here, could lead to more critical usage elsewhere... But agreed, it is hard to show with just some performance runs.
Regards,
Nikhils
On 3/5/07, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:
Tom Lane wrote:
> NikhilS <nikkhils@gmail.com> writes:
>> What is the opinion of the list as to the best way of measuring if the
>> following implementation is ok?
>> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00752.php
>> As mentioned in earlier mails, this will reduce the per-backend usage of
>> memory by an amount which will be a fraction (single digit percentage)
>> of (NBuffers
>> * int) size. I have done pgbench/dbt2 runs and I do not see any negative
>> impact because of this.
>
> I find it extremely telling that you don't claim to have seen any
> positive impact either.
>
> I think that the original argument
> http://archives.postgresql.org/pgsql-hackers/2006-11/msg00797.php
> is basically bogus. At 500000 buffers (4GB in shared memory) the
> per-backend space for PrivateRefCount is still only 2MB, which is
> simply not as significant as Simon claims; a backend needs at least
> that much for catalog caches etc. There is, furthermore, no evidence
> that running shared_buffers that high is a good idea in the first
> place, or that there aren't other performance bottlenecks that will
> manifest before this one becomes interesting.
hmm - we are continuily running into people with dedicated servers that
have 16GB RAM or even more available and most tuning docs recommend some
20-30% of system RAM to get dedicated to shared_buffers. So having some
500k buffers allocated does not sound so unrealistic in practise and
combined with the fact that people often have a few hundred backends
that could add up to some noticable overhead.
If that is actually a problem given that those people tend to have heaps
of memory is another story but if we can preserve some memory ...
Stefan
--
EnterpriseDB http://www.enterprisedb.com