relation cache statistics (was: -HEAD planner issue wrt hash_joins on dbt3 ?) - Mailing list pgsql-hackers

From Jim C. Nasby
Subject relation cache statistics (was: -HEAD planner issue wrt hash_joins on dbt3 ?)
Date
Msg-id 20060918040816.GT38854@enterprisedb.com
Whole thread Raw
In response to Re: -HEAD planner issue wrt hash_joins on dbt3 ?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: relation cache statistics (was: -HEAD planner issue wrt hash_joins on dbt3 ?)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Sep 17, 2006 at 04:18:36PM -0400, Tom Lane wrote:
>  * table and index.  (Ideally other_pages should include all the other
>  * tables and indexes used by the query too; but we don't have a good way
>  * to get that number here.)
> 
> A first-order approximation to this would be to add up the total sizes
> of all the other tables used in the query.  I am thinking of leaving out
> other indexes, mainly because we can't tell at this level which other
> indexes are actually gonna get used.  This would tend to underestimate
> by leaving out indexes, but not by a lot if you assume indexes are much
> smaller than their tables.  It would also be an overestimate because
> tables that are not indexscanned concurrently with the one under
> consideration probably shouldn't be counted anyway.  So one might hope
> these effects would more or less cancel out.  Anyway it seems to be a
> better idea than what we have now.

I think it'd be better to attack this problem from the "other side";
namely looking at what's actually cached. Sadly, I don't think there's
any way to actually query the OS for info about what it has buffered,
but we can look at what's in shared_buffers and assume that it's a
reasonable proxy for the OS's cache. Something like...

relBufPages / shared_buffers * effective_cache_size

should give us a decent idea of what percentage of a relation will be in
cache somewhere. (relBufPages is the number of pages the relation in
question has in the buffer).

Of course, that raises the question of how to track how many pages are
in shared buffers for a relation. Given the criticality of locking
there, we probably don't want to update that info in real-time, but for
this application it's probably OK to just scan through the buffer every
X period of time (maybe after X number of pages read into the buffers).
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Unique index: update error
Next
From: Tom Lane
Date:
Subject: Re: Unique index: update error