Re: cache control? - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: cache control?
Date
Msg-id 000f01c3e0d7$6b387b40$5e00030a@LaptopDellXP
Whole thread Raw
In response to Re: cache control?  ("scott.marlowe" <scott.marlowe@ihs.com>)
Responses Re: cache control?
Re: cache control?
List pgsql-hackers
This discussion seems likely to have a major effect on DBT-3 (DSS-type)
performance from PostgreSQL...

> On Fri, 16 Jan 2004, Michael Brusser wrote:
> 
> > Is there a way to force database to load
> > a frequently-accessed table into cache and keep it there?
> 
> Scott Marlow replied...
>
> Nope.  But there is a new cache buffer handler that may make it into
7.5
> that would make that happen automagically.

The important question here is "what forces blocks out of cache?" rather
than thinking about how to directly keep them there.

> Jeroen T. Vermeulen wrote:
> Sent: Friday, January 16, 2004 23:02
> Subject: [HACKERS] Reverse scans?
> 
> Would it be doable, and would it be useful, to try to alternate the
> directions of table and index scans every time each table/index was
> fully scanned?
> 
> I was thinking that it could help cache performance at various levels
> in cases where data at the end of a large table, say, that remained in
> memory after a scan, would otherwise be flushed out by a new scan of
the
> same table.  If the next scan of the same table was to go in the other
> direction, any remains of the last time around that were still in the
> filesystem cache, buffer pool, hard disk cache etc. would stand a
greater
> chance of being reused.

Jereon's idea is a good one when we consider the current behaviour,
which is 
that large scans are placed into block buffer cache, which then forces
other data out. I would like to question the latter behaviour, so we can
address the cause and not just the symptom.

Earlier versions of Oracle had a parameter called something like
SMALL_TABLE_THRESHOLD. Any scan of a table larger than that threshold
was NOT placed into buffer cache, but was consumed directly by the
shadow process (the backend). Teradata also uses a similar buffer
control technique for large table scans.

If a table is too large to fit into buffer, it clearly wasn't going to
be cached properly in the first place; Jereon's idea only works well for
tables near to the size of the cache. If the table is MUCH bigger then
it will have very little gain. Good original thinking, though I'm not
sure its worth it.

Oracle 9i now offers some variety for buffer cache management (as does
DB2). You can specify at the tablespace and object level whether to use
one of two buffer pools, KEEP or RECYCLE. The idea is to separate out
the two types of blocks - ones that are there because they're well used
and other blocks that are there at the moment, but unlikely to stay.

My suggestion would be to:
- split the buffer cache into two, just as Oracle does: KEEP & RECYCLE.
This could default to KEEP=66% of total memory available, but could also
be settable by init parameter.
[changes to the memory management routines]
- if we do a scan on a table whose size in blocks is more than some
fraction (25%?) of KEEP bufferpool then we place the blocks into RECYCLE
bufferpool. This can be decided immediately following optimization,
rather than including it within the optimizer decision process since we
aren't going to change the way the statement executes, we're just going
to stop it from having an adverse effect on other current or future
statements.
[additional test to set parameter, then work out where to note it]

Notice that I haven't suggested that the KEEP/RECYCLE option could be
specified at table level. That optionality sounds like a lot of extra
work, when what is needed is the automatic avoidance of cache-spoiling
behaviour. (This would still mean that very large indexes with random
request patterns would still spoil cache...maybe implement that later?)

This would remove most reasons for spoiling the cache and blocks would
then leave the cache only when they were genuinely no longer wanted.

Any comments?? Takers?



pgsql-hackers by date:

Previous
From: "Magnus Hagander"
Date:
Subject: Singnals code (not just win32 specific)
Next
From: "Thomas Hallgren"
Date:
Subject: Dynamic modules and standard naming practice