Re: Per-table random_page_cost for tables that we know are always cached - Mailing list pgsql-hackers

From Gurjeet Singh
Subject Re: Per-table random_page_cost for tables that we know are always cached
Date
Msg-id 65937bea0804221158p8312949gaa0b7f8ceb758144@mail.gmail.com
Whole thread Raw
In response to Re: Per-table random_page_cost for tables that we know are always cached  (Decibel! <decibel@decibel.org>)
List pgsql-hackers
On Wed, Apr 23, 2008 at 12:11 AM, Decibel! <decibel@decibel.org> wrote:
On Apr 22, 2008, at 5:13 AM, PFC wrote:
In order to have it use the fast plan I must set random_page_cost to 1 which I absolutely don't want to do.
Setting effective_cache_size to huge values has no effect.
If I select a value of parent_id that has much less children, the index will be used, but in this case I think the threshold is misplaced, it should be slightly higher. Here we have about 5% of values selected. Hash join becomes better at about 15% because the table is cached.
This is 8.3.

Perhaps there would be a need for a per-object setting (object=table,index,partition) to alter the aggressiveness/lazyness of the page flushing and how long the pages for this object are kept in shared_buffers... this would be used to modify random_page_cost on a per-table/index/partition basis.

Example : let's imagine a "cache priority" setting.

- "cache priority" set to the minimum means this table is mostly write-only
- "cache priority" set to default would give current behaviour (which is correct in most cases)
- "cache priority" set to a high value would tell Postgres "I know this table/index/partition is small and often accessed rather randomly, so I want you to keep it in shared_buffers, purge it if you must but otherwise keep it in memory, flush something else instead which has lower cache_priority".

The optimizer could then use a different (much lower) value of random_page_cost for tables for which "cache priority" is set highest since it would know.

"cache priority" to me sounds like we're trying to influence caching behavior, which isn't what's happening. I do agree that we need a better way to tell the planner what tables are in memory.


An alternative would be for the background writer to keep some stats and do the thing for us :

- begin bgwriter scan
- setup hashtable of [relid => page count]
- at each page that is scanned, increment "page count" for this relation (uses very little CPU)
- end bgwriter stats
- for each relation, compare the number of pages we found in shared_buffers with the number of pages in the relation and draw conclusions about how well cached the relation is
- update random_page_cost accordingly for this relation

This would not examine whatever is in the OS' cache, though.

Actually, there's no need for bgwriter to do that; we can just look at the hit rate for the object. But we'd also need stats for how often we find pages for a relation in the OS cache, which no one has come up with a good method for.

Something related... a per-relation cost setting would also allow users to tune based on the kind of storage those objects are stored. Using tablespaces, users can choose to place some objects on really expansive/ really fast storage, and other (not so hot) objects on a slower/cheaper storage.

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

pgsql-hackers by date:

Previous
From: Decibel!
Date:
Subject: Re: Regression test fails when BLCKSZ is 1kB
Next
From: Decibel!
Date:
Subject: Re: MERGE Specification