Thread: Analysis on backend-private memory usage (and a patch)

Analysis on backend-private memory usage (and a patch)

From

Heikki Linnakangas

Date:

04 September 2013, 19:46:32

I received a complaint that each backend consumes a lot of
backend-private memory, even if it's completely idle. "a lot" is of
course very subjective and how much memory is actually used depends
heavily on the application. In this case, the database is fairly small,
but they have 250 connections. 'top' output says that each backend is
consuming roughly 3MB of memory (RES - SHR). That's 750 MB of
backend-private memory, which is a significant chunk of total RAM.

So I spent some time analyzing backend memory usage, looking for any
low-hanging fruit. This isn't *that* big an issue, so I don't think we'd
want to do any big rearchitecting for this.

On my laptop, just starting psql, the backend uses 1632 KB of private
memory. Running a simple query like "select * from foo where i = 1"
makes no noticeable difference, but after "\d" (which I'm using to
represent a somewhat more complicated query), it goes up to 1960 KB.

The largest consumer of that memory is the relcache and syscaches. After
starting psql, without running any queries, MemoryContextStats says:

CacheMemoryContext: 817840 total in 20 blocks; 134824 free (4 chunks);
683016 used

plus there is one sub-memorycontext for each index in the relcache, each
using about 1KB. After "\d":

CacheMemoryContext: 1342128 total in 21 blocks; 517472 free (1 chunks);
824656 used


Another thing that can consume a lot of memory is PrivateRefCount lookup
table. It's an array with one int32 for each shared buffer, ie. 512 KB
for each GB of shared_buffers. See previous discussion here:
http://www.postgresql.org/message-id/flat/1164624036.3778.107.camel@silverbirch.site.
That discussion didn't lead to anything, but I think there's some
potential in turning PrivateRefCount into a tiny hash table or simply a
linear array. Or even simpler, change it from int32 to int16, and accept
that you will get an error if you try to hold more than 2^16 pins one a
buffer in one backend.

One fairly simple thing we could do is to teach catcache.c to resize the
caches. Then we could make the initial size of all the syscaches much
smaller. At the moment, we use fairly caches for catalogs like pg_enum
(256 entries) and pg_usermapping (128), even though most databases don't
use those features at all. If they could be resized on demand, we could
easily allocate them initially with just, say, 4 entries.

Attached is a patch for that. That saves about 300 KB, for a backend
that does nothing. Resizing the caches on demand also has the benefit
that if you have a lot more objects of some type than usual, lookups
won't be bogged down by a too small cache. I haven't tried to measure
that, though.

- Heikki

Attachment

resize-syscaches-1.patch

Re: Analysis on backend-private memory usage (and a patch)

From

Tom Lane

Date:

04 September 2013, 20:56:58

Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> One fairly simple thing we could do is to teach catcache.c to resize the 
> caches. Then we could make the initial size of all the syscaches much 
> smaller.

I think this is attractive for the *other* reason you mention, namely
preserving reasonable performance when a catcache grows larger than
expected; but I'm pretty skeptical of nickel-and-diming caches that are
already really small.  Is it really worth cutting the TSPARSER caches
from 4 pointers to 2 for instance?

What concerns me about initially-undersized caches is that we'll waste
space and time in the enlargement process.  I'd suggest trying to get some
numbers about the typical size of each cache in a backend that's done a
few things (not merely started up --- we should not be optimizing for the
case of connections that get abandoned without running any queries).
Then set the initial size to the next larger power of 2.
        regards, tom lane

Re: Analysis on backend-private memory usage (and a patch)

From

Heikki Linnakangas

Date:

05 September 2013, 10:29:43

On 04.09.2013 23:56, Tom Lane wrote:
> Heikki Linnakangas<hlinnakangas@vmware.com>  writes:
>> One fairly simple thing we could do is to teach catcache.c to resize the
>> caches. Then we could make the initial size of all the syscaches much
>> smaller.
>
> I think this is attractive for the *other* reason you mention, namely
> preserving reasonable performance when a catcache grows larger than
> expected; but I'm pretty skeptical of nickel-and-diming caches that are
> already really small.  Is it really worth cutting the TSPARSER caches
> from 4 pointers to 2 for instance?

Yeah, that may be overdoing it. Then again, enlarging a hash table from 
2 to 4 entries when needed is also very cheap.

> What concerns me about initially-undersized caches is that we'll waste
> space and time in the enlargement process.

Enlarging a small hash tables is very cheap because, well, it's small. 
Enlarging a large hash table is more expensive, but if you have a lot of 
entries in the hash, then you also get the benefit of a larger hash when 
doing lookups. It does require some memory to hold the old and the new 
hash table while rehashing, but again, with a small hash table that's 
not significant, and with a large one the actual cached tuples take a 
lot more space than the buckets array anyway.

Per Wikipedia [1]:

> If the table size increases or decreases by a fixed percentage at each expansion, the total cost of these resizings,
amortizedover all insert and delete operations, is still a constant, independent of the number of entries n and of the
numberm of operations performed.
 
>
> For example, consider a table that was created with the minimum possible size and is doubled each time the load ratio
exceedssome threshold. If m elements are inserted into that table, the total number of extra re-insertions that occur
inall dynamic resizings of the table is at most m − 1. In other words, dynamic resizing roughly doubles the cost of
eachinsert or delete operation.
 

Considering the amount of work involved in adding an entry to a catalog 
cache, I wouldn't worry about adding a tiny constant to the insertion time.

I did some quick testing by creating 100000 tables, and running a 
pgbench script that selects randomly from them:

\setrandom tableno 1 100000
select * from foo:tableno where i = 1;

I timed the rehash operations with gettimeofday calls before and after 
the rehash:

> LOG:  rehashed catalog cache id 44 for pg_class from 256 to 512 buckets: 27 us
> LOG:  rehashed catalog cache id 47 for pg_statistic from 128 to 256 buckets: 14 us
> LOG:  rehashed catalog cache id 44 for pg_class from 512 to 1024 buckets: 54 us
> LOG:  rehashed catalog cache id 45 for pg_class from 256 to 512 buckets: 29 us
> LOG:  rehashed catalog cache id 47 for pg_statistic from 256 to 512 buckets: 30 us
> LOG:  rehashed catalog cache id 44 for pg_class from 1024 to 2048 buckets: 147 us
> LOG:  rehashed catalog cache id 7 for pg_attribute from 512 to 1024 buckets: 87 us
> LOG:  rehashed catalog cache id 45 for pg_class from 512 to 1024 buckets: 80 us
> LOG:  rehashed catalog cache id 47 for pg_statistic from 512 to 1024 buckets: 88 us
> LOG:  rehashed catalog cache id 44 for pg_class from 2048 to 4096 buckets: 342 us
> LOG:  rehashed catalog cache id 7 for pg_attribute from 1024 to 2048 buckets: 197 us
> LOG:  rehashed catalog cache id 45 for pg_class from 1024 to 2048 buckets: 183 us
> LOG:  rehashed catalog cache id 47 for pg_statistic from 1024 to 2048 buckets: 194 us
> LOG:  rehashed catalog cache id 44 for pg_class from 4096 to 8192 buckets: 764 us
> LOG:  rehashed catalog cache id 7 for pg_attribute from 2048 to 4096 buckets: 401 us
> LOG:  rehashed catalog cache id 45 for pg_class from 2048 to 4096 buckets: 383 us
> LOG:  rehashed catalog cache id 47 for pg_statistic from 2048 to 4096 buckets: 406 us
> LOG:  rehashed catalog cache id 44 for pg_class from 8192 to 16384 buckets: 1758 us
> LOG:  rehashed catalog cache id 7 for pg_attribute from 4096 to 8192 buckets: 833 us
> LOG:  rehashed catalog cache id 45 for pg_class from 4096 to 8192 buckets: 842 us
> LOG:  rehashed catalog cache id 47 for pg_statistic from 4096 to 8192 buckets: 859 us
> LOG:  rehashed catalog cache id 44 for pg_class from 16384 to 32768 buckets: 3564 us
> LOG:  rehashed catalog cache id 7 for pg_attribute from 8192 to 16384 buckets: 1769 us
> LOG:  rehashed catalog cache id 45 for pg_class from 8192 to 16384 buckets: 1752 us
> LOG:  rehashed catalog cache id 47 for pg_statistic from 8192 to 16384 buckets: 1719 us
> LOG:  rehashed catalog cache id 44 for pg_class from 32768 to 65536 buckets: 7538 us
> LOG:  rehashed catalog cache id 7 for pg_attribute from 16384 to 32768 buckets: 3644 us
> LOG:  rehashed catalog cache id 45 for pg_class from 16384 to 32768 buckets: 3609 us
> LOG:  rehashed catalog cache id 47 for pg_statistic from 16384 to 32768 buckets: 3508 us
> LOG:  rehashed catalog cache id 44 for pg_class from 65536 to 131072 buckets: 16457 us
> LOG:  rehashed catalog cache id 7 for pg_attribute from 32768 to 65536 buckets: 7978 us
> LOG:  rehashed catalog cache id 45 for pg_class from 32768 to 65536 buckets: 8281 us
> LOG:  rehashed catalog cache id 47 for pg_statistic from 32768 to 65536 buckets: 7724 us

The time spent in rehashing seems to be about 60 ns per catcache entry 
(the patch rehashes when fillfactor reaches 2, so when rehashing e.g 
from 256 to 512 buckets, there are 1024 entries in the hash), at the 
larger hash sizes.

>  I'd suggest trying to get some
> numbers about the typical size of each cache in a backend that's done a
> few things (not merely started up --- we should not be optimizing for the
> case of connections that get abandoned without running any queries).
> Then set the initial size to the next larger power of 2.

Makes sense.

I ran pgbench for ten seconds, and printed the number of tuples in each 
catcache after that:

LOG:  cache id 61 on (not known yet): 0 tups
LOG:  cache id 60 on (not known yet): 0 tups
LOG:  cache id 59 on pg_type: 6 tups
LOG:  cache id 58 on (not known yet): 0 tups
LOG:  cache id 57 on (not known yet): 0 tups
LOG:  cache id 56 on (not known yet): 0 tups
LOG:  cache id 55 on (not known yet): 0 tups
LOG:  cache id 54 on (not known yet): 0 tups
LOG:  cache id 53 on (not known yet): 0 tups
LOG:  cache id 52 on (not known yet): 0 tups
LOG:  cache id 51 on (not known yet): 0 tups
LOG:  cache id 50 on (not known yet): 0 tups
LOG:  cache id 49 on (not known yet): 0 tups
LOG:  cache id 48 on pg_tablespace: 1 tups
LOG:  cache id 47 on pg_statistic: 11 tups
LOG:  cache id 46 on (not known yet): 0 tups
LOG:  cache id 45 on pg_class: 4 tups
LOG:  cache id 44 on pg_class: 8 tups
LOG:  cache id 43 on (not known yet): 0 tups
LOG:  cache id 42 on pg_proc: 4 tups
LOG:  cache id 41 on pg_proc: 1 tups
LOG:  cache id 40 on (not known yet): 0 tups
LOG:  cache id 39 on (not known yet): 0 tups
LOG:  cache id 38 on pg_operator: 2 tups
LOG:  cache id 37 on pg_operator: 2 tups
LOG:  cache id 36 on (not known yet): 0 tups
LOG:  cache id 35 on pg_namespace: 3 tups
LOG:  cache id 34 on (not known yet): 0 tups
LOG:  cache id 33 on (not known yet): 0 tups
LOG:  cache id 32 on pg_index: 5 tups
LOG:  cache id 31 on (not known yet): 0 tups
LOG:  cache id 30 on (not known yet): 0 tups
LOG:  cache id 29 on (not known yet): 0 tups
LOG:  cache id 28 on (not known yet): 0 tups
LOG:  cache id 27 on (not known yet): 0 tups
LOG:  cache id 26 on (not known yet): 0 tups
LOG:  cache id 25 on (not known yet): 0 tups
LOG:  cache id 24 on (not known yet): 0 tups
LOG:  cache id 23 on (not known yet): 0 tups
LOG:  cache id 22 on (not known yet): 0 tups
LOG:  cache id 21 on pg_database: 1 tups
LOG:  cache id 20 on (not known yet): 0 tups
LOG:  cache id 19 on (not known yet): 0 tups
LOG:  cache id 18 on (not known yet): 0 tups
LOG:  cache id 17 on (not known yet): 0 tups
LOG:  cache id 16 on (not known yet): 0 tups
LOG:  cache id 15 on (not known yet): 0 tups
LOG:  cache id 14 on (not known yet): 0 tups
LOG:  cache id 13 on (not known yet): 0 tups
LOG:  cache id 12 on pg_cast: 1 tups
LOG:  cache id 11 on pg_authid: 1 tups
LOG:  cache id 10 on pg_authid: 1 tups
LOG:  cache id 9 on (not known yet): 0 tups
LOG:  cache id 8 on (not known yet): 0 tups
LOG:  cache id 7 on pg_attribute: 6 tups
LOG:  cache id 6 on (not known yet): 0 tups
LOG:  cache id 5 on (not known yet): 0 tups
LOG:  cache id 4 on pg_amop: 1 tups
LOG:  cache id 3 on pg_amop: 2 tups
LOG:  cache id 2 on pg_am: 1 tups
LOG:  cache id 1 on (not known yet): 0 tups
LOG:  cache id 0 on (not known yet): 0 tups
CacheMemoryContext: 516096 total in 6 blocks; 81192 free (0 chunks); 
434904 used

I'm surprised how few rows there are in the caches after that. Actually, 
given that, and the above timings, I'm starting to think that we should 
just get rid of the hand-tuned initial sizes altogether. Start all 
caches small, say only 1 entries, and just let the automatic rehashing 
enlarge them as required. Of course, a more complicated query will touch 
many more catalogs, but resizing is cheap enough that it doesn't really 
matter.


PS. Once the hashes are resized on demand, perhaps we should get rid of 
the move-to-the-head behavior in SearchCatCache. If all the buckets 
contain < 2 entries on average, moving the least-recently-used one to 
the front hardly makes any difference in lookup times. But it does add a 
couple of cycles to every cache hit.

- Heikki

Re: Analysis on backend-private memory usage (and a patch)

From

Tom Lane

Date:

05 September 2013, 14:22:22

Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> I ran pgbench for ten seconds, and printed the number of tuples in each 
> catcache after that:
> [ very tiny numbers ]

I find these numbers a bit suspicious.  For example, we must have hit at
least 13 different system catalogs, and more than that many indexes, in
the course of populating the syscaches you show as initialized.  How is
it there are only 4 entries in the RELOID cache?  I wonder if there were
cache resets going on.

A larger issue is that pgbench might not be too representative.  In
a quick check, I find that cache 37 (OPERNAMENSP) starts out empty,
and contains 1 entry after "select 2=2", which is expected since 
the operator-lookup code will start by looking for int4 = int4 and
will get an exact match.  But after "select 2=2::numeric" there are
61 entries, as a byproduct of having thumbed through every binary
operator named "=" to resolve the ambiguous match.  We went so far
as to install another level of caching in front of OPERNAMENSP because
it was getting too expensive to deal with heavily-overloaded operators
like that one.  In general, we've had to spend enough sweat on optimizing
catcache searches to make me highly dubious of any claim that the caches
are usually almost empty.

I understand your argument that resizing is so cheap that it might not
matter, but nonetheless reducing these caches as far as you're suggesting
sounds to me to be penny-wise and pound-foolish.  I'm okay with setting
them on the small side rather than on the large side as they are now, but
not with choosing sizes that are guaranteed to result in resizing cycles
during startup of any real app.

> PS. Once the hashes are resized on demand, perhaps we should get rid of 
> the move-to-the-head behavior in SearchCatCache. If all the buckets 

-1.  If the bucket in fact has just one member, dlist_move_head reduces to
just one comparison.  And again I argue that you're optimizing for the
wrong case.  Pure luck will result in some hash chains being (much) longer
than the average, and if we don't do move-to-front we'll get hurt there.
        regards, tom lane

Re: Analysis on backend-private memory usage (and a patch)

From

Heikki Linnakangas

Date:

05 September 2013, 17:29:59

On 05.09.2013 17:22, Tom Lane wrote:
> Heikki Linnakangas<hlinnakangas@vmware.com>  writes:
>> I ran pgbench for ten seconds, and printed the number of tuples in each
>> catcache after that:
>> [ very tiny numbers ]
>
> I find these numbers a bit suspicious.  For example, we must have hit at
> least 13 different system catalogs, and more than that many indexes, in
> the course of populating the syscaches you show as initialized.  How is
> it there are only 4 entries in the RELOID cache?  I wonder if there were
> cache resets going on.

Relcache is loaded from the init file. The lookups of those system
catalogs and indexes never hit the syscache, because the entries are
found in relcache. When I delete the init file and launch psql, without
running any queries, I get this (caches with 0 tups left out):

LOG:  cache id 45 on pg_class: 7 tups
LOG:  cache id 32 on pg_index: 63 tups
LOG:  cache id 21 on pg_database: 1 tups
LOG:  cache id 11 on pg_authid: 1 tups
LOG:  cache id 10 on pg_authid: 1 tups
LOG:  cache id 2 on pg_am: 1 tups

> A larger issue is that pgbench might not be too representative.  In
> a quick check, I find that cache 37 (OPERNAMENSP) starts out empty,
> and contains 1 entry after "select 2=2", which is expected since
> the operator-lookup code will start by looking for int4 = int4 and
> will get an exact match.  But after "select 2=2::numeric" there are
> 61 entries, as a byproduct of having thumbed through every binary
> operator named "=" to resolve the ambiguous match.  We went so far
> as to install another level of caching in front of OPERNAMENSP because
> it was getting too expensive to deal with heavily-overloaded operators
> like that one.  In general, we've had to spend enough sweat on optimizing
> catcache searches to make me highly dubious of any claim that the caches
> are usually almost empty.
>
> I understand your argument that resizing is so cheap that it might not
> matter, but nonetheless reducing these caches as far as you're suggesting
> sounds to me to be penny-wise and pound-foolish.  I'm okay with setting
> them on the small side rather than on the large side as they are now, but
> not with choosing sizes that are guaranteed to result in resizing cycles
> during startup of any real app.

Ok, committed the attached.

To choose the initial sizes, I put a WARNING into the rehash function,
ran the regression suite, and adjusted the sizes so that most regression
tests run without rehashing. With the attached patch, 18 regression
tests cause rehashing (see regression.diffs). The ones that do are
because they are exercising some parts of the system more than a typical
application would do: the enum regression test for example causes
rehashes of the pg_enum catalog cache, and the aggregate regression test
causes rehashing of pg_aggregate, and so on. A few regression tests do a
database-wide VACUUM or ANALYZE; those touch all relations, and cause a
rehash of pg_class and pg_index.

- Heikki

Attachment

Re: Analysis on backend-private memory usage (and a patch)

From

Greg Stark

Date:

06 September 2013, 10:30:15

<p dir="ltr"><br /> On 4 Sep 2013 20:46, "Heikki Linnakangas" <<a
href="mailto:hlinnakangas@vmware.com">hlinnakangas@vmware.com</a>>wrote:<br /> ><p dir="ltr">> One fairly
simplething we could do is to teach catcache.c to resize the caches. Then we could make the initial size of all the
syscachesmuch smaller. At the moment, we use fairly caches for catalogs like pg_enum (256 entries) and pg_usermapping
(128),even though most databases don't use those features at all. If they could be resized on demand, we could easily
allocatethem initially with just, say, 4 entries.<p dir="ltr">If most databases don't use the feature at all, tsparser,
enums,etc, why not start out with *no* cache and only build one when it's first needed? This would also mean there's
lessoverhead for implementing new features that aren't universally used.