Thread: caching indexes and pages?

caching indexes and pages?

From
Thomas Finneid
Date:
Hi I am developing a database and have a couple of questions I havent
found an answer to yet.

1) how do I find the size of an index, i.e. the size on disk?

2) I have a query that is taking a long time to complete because the
table is about 120GB large. Its only returning 2000 rows, so in
principle it should be fast. But because the data is spread across the
table, I am assuming it needs to do a lot of disk access to fetch the
appropriate pages. Since the amount of data is so large I am also
assuming that whenever I do a query all memory caches have to be
replaced to make room for the new pages. What I am wondering is which
config settings can I use to increase the amount of memory postgres
keeps to cache pages and indexes?

I tried work_mem and maintenance_work_mem but it does not seem to make
much difference yet. Admittedly I had set it to 100M and 80M, so after
reading a little bit more I have found that I could easily set it to
several GBs. But I am not sure those are the correct config parameters
to use for this. I havent found any others that are relevant so far.

regards

thomas

Re: caching indexes and pages?

From
"A. Kretschmer"
Date:
In response to Thomas Finneid :
>
> Hi I am developing a database and have a couple of questions I havent
> found an answer to yet.
>
> 1) how do I find the size of an index, i.e. the size on disk?

http://www.postgresql.org/docs/8.3/interactive/functions-admin.html
http://andreas.scherbaum.la/blog/archives/282-table-size,-database-size.html


>
> 2) I have a query that is taking a long time to complete because the

How long is a long time? *g*


> table is about 120GB large. Its only returning 2000 rows, so in
> principle it should be fast. But because the data is spread across the
> table, I am assuming it needs to do a lot of disk access to fetch the
> appropriate pages. Since the amount of data is so large I am also

Please execute your query with an additionally 'explain analyse select
...' and look at the output. Maybe there are no propper index for your
select.


Regards, Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: caching indexes and pages?

From
Thomas Markus
Date:
Hi,

Thomas Finneid schrieb:
>
> Hi I am developing a database and have a couple of questions I havent
> found an answer to yet.
>
> 1) how do I find the size of an index, i.e. the size on disk?
i use this query:
select
      t.spcname as "tablespace"
    , pg_get_userbyid(c.relowner) as "owner"
    , n.nspname as "schema"
    , relname::text as "name"
    , pg_size_pretty(pg_total_relation_size(c.oid))::text as "total size"
    , case
        when c.relkind='i' then 'index'
        when c.relkind='t' then 'toast'
        when c.relkind='r' then 'table'
        when c.relkind='v' then 'view'
        when c.relkind='c' then 'composite type'
        when c.relkind='S' then 'sequence'
        else c.relkind::text
      end as "type"
from
    pg_class c
    left join pg_namespace n on n.oid = c.relnamespace
    left join pg_tablespace t on t.oid = c.reltablespace
where
    (pg_total_relation_size(c.oid)>>20)>0 and c.relkind!='t'
order by
    c.relkind desc, pg_total_relation_size(c.oid) desc
>
> 2) I have a query that is taking a long time to complete because the
> table is about 120GB large. Its only returning 2000 rows, so in
> principle it should be fast. But because the data is spread across the
> table, I am assuming it needs to do a lot of disk access to fetch the
> appropriate pages. Since the amount of data is so large I am also
> assuming that whenever I do a query all memory caches have to be
> replaced to make room for the new pages. What I am wondering is which
> config settings can I use to increase the amount of memory postgres
> keeps to cache pages and indexes?
try to reorganize your data with CLUSTER and create appropriate indixes
(dont forget to check statistics). there are several threads about
memory configuration. look for shared_buffers
>
> I tried work_mem and maintenance_work_mem but it does not seem to make
> much difference yet. Admittedly I had set it to 100M and 80M, so after
> reading a little bit more I have found that I could easily set it to
> several GBs. But I am not sure those are the correct config parameters
> to use for this. I havent found any others that are relevant so far.
>
> regards
>
> thomas
regards
thomas


Attachment

Re: caching indexes and pages?

From
Robert Haas
Date:
> I tried work_mem and maintenance_work_mem but it does not seem to make much
> difference yet. Admittedly I had set it to 100M and 80M, so after reading a
> little bit more I have found that I could easily set it to several GBs. But
> I am not sure those are the correct config parameters to use for this. I
> havent found any others that are relevant so far.

You probably want to increase shared_buffers by quite a large amount
and maybe make work_mem not quite so big.

If you have 2GB of memory you might set shared_buffers to 1GB,
work_mem 16MB, maintenance_work_mem 64MB?

...Robert

Re: caching indexes and pages?

From
Thomas Finneid
Date:
Thomas Markus wrote:

> try to reorganize your data with CLUSTER and create appropriate indixes
> (dont forget to check statistics).

One question. Assume I have clustered and new data has been added after
that, according to the docs that data is added "outside" of the
clustered data. What happens when I run cluster again? I would assume
its smart and to only clusteres the new data, i.e. adding it to the
already created clusters, as apporpriate, so the execution time would be
a lot lower, right? or would it run through and recluster everything
from scratch again?

thomas


Re: caching indexes and pages?

From
Kenneth Marshall
Date:
On Thu, Jan 22, 2009 at 10:58:25PM +0100, Thomas Finneid wrote:
> Thomas Markus wrote:
>
>> try to reorganize your data with CLUSTER and create appropriate indixes
>> (dont forget to check statistics).
>
> One question. Assume I have clustered and new data has been added after
> that, according to the docs that data is added "outside" of the clustered
> data. What happens when I run cluster again? I would assume its smart and
> to only clusteres the new data, i.e. adding it to the already created
> clusters, as apporpriate, so the execution time would be a lot lower,
> right? or would it run through and recluster everything from scratch again?
>
> thomas
>
It reclusters again from scratch. You do get better performance on the
reads from the data that is already clustered.

Cheers,
Ken

Re: caching indexes and pages?

From
Craig Ringer
Date:
Thomas Finneid wrote:
> Thomas Markus wrote:
>
>> try to reorganize your data with CLUSTER and create appropriate
>> indixes (dont forget to check statistics).
>
> One question. Assume I have clustered and new data has been added after
> that, according to the docs that data is added "outside" of the
> clustered data.

Check into FILLFACTOR (on both tables and indexes).

--
Craig Ringer