Re: cache control? - Mailing list pgsql-hackers
From | Jan Wieck |
---|---|
Subject | Re: cache control? |
Date | |
Msg-id | 400FDC4A.1050308@Yahoo.com Whole thread Raw |
In response to | Re: cache control? ("Simon Riggs" <simon@2ndquadrant.com>) |
Responses |
Re: cache control?
|
List | pgsql-hackers |
Simon, have you read src/backend/storage/buffer/README of current CVS tip? The algorithm in the new replacement strategy is an attempt to figure that SMALL_TABLE_THRESHOLD automatically. Do you see anything that can be improved in that algorithm? Jan Simon Riggs wrote: > This discussion seems likely to have a major effect on DBT-3 (DSS-type) > performance from PostgreSQL... > >> On Fri, 16 Jan 2004, Michael Brusser wrote: >> >> > Is there a way to force database to load >> > a frequently-accessed table into cache and keep it there? >> >> Scott Marlow replied... >> >> Nope. But there is a new cache buffer handler that may make it into > 7.5 >> that would make that happen automagically. > > The important question here is "what forces blocks out of cache?" rather > than thinking about how to directly keep them there. > >> Jeroen T. Vermeulen wrote: >> Sent: Friday, January 16, 2004 23:02 >> Subject: [HACKERS] Reverse scans? >> >> Would it be doable, and would it be useful, to try to alternate the >> directions of table and index scans every time each table/index was >> fully scanned? >> >> I was thinking that it could help cache performance at various levels >> in cases where data at the end of a large table, say, that remained in >> memory after a scan, would otherwise be flushed out by a new scan of > the >> same table. If the next scan of the same table was to go in the other >> direction, any remains of the last time around that were still in the >> filesystem cache, buffer pool, hard disk cache etc. would stand a > greater >> chance of being reused. > > Jereon's idea is a good one when we consider the current behaviour, > which is > that large scans are placed into block buffer cache, which then forces > other data out. I would like to question the latter behaviour, so we can > address the cause and not just the symptom. > > Earlier versions of Oracle had a parameter called something like > SMALL_TABLE_THRESHOLD. Any scan of a table larger than that threshold > was NOT placed into buffer cache, but was consumed directly by the > shadow process (the backend). Teradata also uses a similar buffer > control technique for large table scans. > > If a table is too large to fit into buffer, it clearly wasn't going to > be cached properly in the first place; Jereon's idea only works well for > tables near to the size of the cache. If the table is MUCH bigger then > it will have very little gain. Good original thinking, though I'm not > sure its worth it. > > Oracle 9i now offers some variety for buffer cache management (as does > DB2). You can specify at the tablespace and object level whether to use > one of two buffer pools, KEEP or RECYCLE. The idea is to separate out > the two types of blocks - ones that are there because they're well used > and other blocks that are there at the moment, but unlikely to stay. > > My suggestion would be to: > - split the buffer cache into two, just as Oracle does: KEEP & RECYCLE. > This could default to KEEP=66% of total memory available, but could also > be settable by init parameter. > [changes to the memory management routines] > - if we do a scan on a table whose size in blocks is more than some > fraction (25%?) of KEEP bufferpool then we place the blocks into RECYCLE > bufferpool. This can be decided immediately following optimization, > rather than including it within the optimizer decision process since we > aren't going to change the way the statement executes, we're just going > to stop it from having an adverse effect on other current or future > statements. > [additional test to set parameter, then work out where to note it] > > Notice that I haven't suggested that the KEEP/RECYCLE option could be > specified at table level. That optionality sounds like a lot of extra > work, when what is needed is the automatic avoidance of cache-spoiling > behaviour. (This would still mean that very large indexes with random > request patterns would still spoil cache...maybe implement that later?) > > This would remove most reasons for spoiling the cache and blocks would > then leave the cache only when they were genuinely no longer wanted. > > Any comments?? Takers? > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
pgsql-hackers by date: