Thread: Increasing catcache size

Increasing catcache size

From
Tom Lane
Date:
Awhile back, there was a discussion about psql \d display being really
slow in a database with 4000 tables:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01085.php

I looked into this some, and it seems that part of the problem is that
the catalog caches are limited to hold no more than 5000 tuples in total
(MAXCCTUPLES in catcache.c).  When you're looking at thousands of
tables, the tuples you want age out of cache immediately and so every
SearchSysCache call has to actually go to disk (or at least shared
buffers).

The 5000 setting is basically Berkeley-era (when we got the code from
Berkeley, the limit was 300 tuples per cache, which I changed to a
system-wide limit some years ago).  A reasonable guess is that we're
talking about entries of 100 to 200 bytes each, making the maximum
catcache space usage a megabyte or less (per backend, that is).  Seems
like this is mighty small on modern machines.

We could make a GUC variable to control this, or just arbitrarily bump
up the constant to 50000 or so.  Is it worth another GUC?  Not sure.

There's another hardwired setting right next to this one, which is the
number of hash buckets per catcache.   Since the caches are likely to
have significantly different sizes, that is looking a bit obsolete too.
We could put per-cache numbers into the setup lists in syscache.c
easily.  Making the cache sizes dynamic a la dynahash.c is another
possibility, but I'm unsure if it's worth the trouble.

Thoughts?
        regards, tom lane


Re: Increasing catcache size

From
Bruce Momjian
Date:
I am thinking we should scale it based on max_fsm_relations.

---------------------------------------------------------------------------

Tom Lane wrote:
> Awhile back, there was a discussion about psql \d display being really
> slow in a database with 4000 tables:
> http://archives.postgresql.org/pgsql-hackers/2005-09/msg01085.php
> 
> I looked into this some, and it seems that part of the problem is that
> the catalog caches are limited to hold no more than 5000 tuples in total
> (MAXCCTUPLES in catcache.c).  When you're looking at thousands of
> tables, the tuples you want age out of cache immediately and so every
> SearchSysCache call has to actually go to disk (or at least shared
> buffers).
> 
> The 5000 setting is basically Berkeley-era (when we got the code from
> Berkeley, the limit was 300 tuples per cache, which I changed to a
> system-wide limit some years ago).  A reasonable guess is that we're
> talking about entries of 100 to 200 bytes each, making the maximum
> catcache space usage a megabyte or less (per backend, that is).  Seems
> like this is mighty small on modern machines.
> 
> We could make a GUC variable to control this, or just arbitrarily bump
> up the constant to 50000 or so.  Is it worth another GUC?  Not sure.
> 
> There's another hardwired setting right next to this one, which is the
> number of hash buckets per catcache.   Since the caches are likely to
> have significantly different sizes, that is looking a bit obsolete too.
> We could put per-cache numbers into the setup lists in syscache.c
> easily.  Making the cache sizes dynamic a la dynahash.c is another
> possibility, but I'm unsure if it's worth the trouble.
> 
> Thoughts?
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Increasing catcache size

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I am thinking we should scale it based on max_fsm_relations.

Hmm ... tables are not the only factor in the required catcache size,
and max_fsm_relations tells more about the total installation size
than the number of tables in your particular database.  But it's one
possible approach.

I just thought of a more radical idea: do we need a limit on catcache
size at all?  On "normal size" databases I believe that we never hit
5000 entries at all (at least, last time I ran the CATCACHE_STATS code
on the regression tests, we didn't get close to that).  We don't have
any comparable limit in the relcache and it doesn't seem to hurt us,
even though a relcache entry is a pretty heavyweight object.

If we didn't try to enforce a limit on catcache size, we could get rid
of the catcache LRU lists entirely, which'd make for a nice savings in
lookup overhead (the MoveToFront operations in catcache.c are a
nontrivial part of SearchSysCache according to profiling I've done,
so getting rid of one of the two would be nice).
        regards, tom lane


Re: Increasing catcache size

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I am thinking we should scale it based on max_fsm_relations.
> 
> Hmm ... tables are not the only factor in the required catcache size,
> and max_fsm_relations tells more about the total installation size
> than the number of tables in your particular database.  But it's one
> possible approach.
> 
> I just thought of a more radical idea: do we need a limit on catcache
> size at all?  On "normal size" databases I believe that we never hit
> 5000 entries at all (at least, last time I ran the CATCACHE_STATS code
> on the regression tests, we didn't get close to that).  We don't have
> any comparable limit in the relcache and it doesn't seem to hurt us,
> even though a relcache entry is a pretty heavyweight object.
> 
> If we didn't try to enforce a limit on catcache size, we could get rid
> of the catcache LRU lists entirely, which'd make for a nice savings in
> lookup overhead (the MoveToFront operations in catcache.c are a
> nontrivial part of SearchSysCache according to profiling I've done,
> so getting rid of one of the two would be nice).

Well, assuming you never access all those tables, you don't use lots of
memory, but if you are accessing a lot, it seems memory for all your
tables is a minimal overhead.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Increasing catcache size

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> I just thought of a more radical idea: do we need a limit on catcache
>> size at all?  On "normal size" databases I believe that we never hit
>> 5000 entries at all (at least, last time I ran the CATCACHE_STATS code
>> on the regression tests, we didn't get close to that).  We don't have
>> any comparable limit in the relcache and it doesn't seem to hurt us,
>> even though a relcache entry is a pretty heavyweight object.

> Well, assuming you never access all those tables, you don't use lots of
> memory, but if you are accessing a lot, it seems memory for all your
> tables is a minimal overhead.

I re-did the test of running the regression tests with CATCACHE_STATS
enabled.  The largest catcache population in any test was 1238 tuples,
and most backends had 500 or less.  I'm not sure whether you'd really
want to consider the regression database as representative of small
production databases, but granted that assumption, the current limit of
5000 tuples isn't limiting anything on small-to-middling databases.
(Note we are counting tables and other cataloged objects, *not* volume
of data stored --- so the regression database could easily be much
bigger than many production DBs by this measure.)

So I'm pretty strongly inclined to just dike out the limit.  If you're
running a database big enough to hit the existing limit, you can well
afford to put more memory into the catcache.
        regards, tom lane


Re: Increasing catcache size

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> I just thought of a more radical idea: do we need a limit on catcache
> >> size at all?  On "normal size" databases I believe that we never hit
> >> 5000 entries at all (at least, last time I ran the CATCACHE_STATS code
> >> on the regression tests, we didn't get close to that).  We don't have
> >> any comparable limit in the relcache and it doesn't seem to hurt us,
> >> even though a relcache entry is a pretty heavyweight object.
> 
> > Well, assuming you never access all those tables, you don't use lots of
> > memory, but if you are accessing a lot, it seems memory for all your
> > tables is a minimal overhead.
> 
> I re-did the test of running the regression tests with CATCACHE_STATS
> enabled.  The largest catcache population in any test was 1238 tuples,
> and most backends had 500 or less.  I'm not sure whether you'd really
> want to consider the regression database as representative of small
> production databases, but granted that assumption, the current limit of
> 5000 tuples isn't limiting anything on small-to-middling databases.
> (Note we are counting tables and other cataloged objects, *not* volume
> of data stored --- so the regression database could easily be much
> bigger than many production DBs by this measure.)
> 
> So I'm pretty strongly inclined to just dike out the limit.  If you're
> running a database big enough to hit the existing limit, you can well
> afford to put more memory into the catcache.

And if we get problem reports, we can fix it.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Increasing catcache size

From
Martijn van Oosterhout
Date:
On Wed, Jun 14, 2006 at 08:04:31PM -0400, Tom Lane wrote:
> So I'm pretty strongly inclined to just dike out the limit.  If you're
> running a database big enough to hit the existing limit, you can well
> afford to put more memory into the catcache.

If you do a \d, does that load every tuple from pg_class into the
catcache?

Your suggestion sounds like a good one...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Increasing catcache size

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> If you do a \d, does that load every tuple from pg_class into the
> catcache?

Many of 'em, not sure about "all".
        regards, tom lane