Re: cache control? - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: cache control?
Date
Msg-id 40110F1F.4050605@Yahoo.com
Whole thread Raw
In response to Re: cache control?  ("Simon Riggs" <simon@2ndquadrant.com>)
Responses Re: cache control?
List pgsql-hackers
Simon,

thanks for the time to give this further thought.


Simon Riggs wrote:
> If we know ahead of time that a large scan is going to have this effect,
> why wait for the ARC to play its course, why not take exactly the same
> action?
> Have large scans call StrategyHint also. (Maybe rename it...?)...of
> course, some extra code to establish it IS a large scan...
> ...large table lookup should wait until a shared catalog cache is
> implemented

The problem with this is a) how to detect that something will be a large 
scan, and b) how to decide what is a large scan in the first place.

Large sequential scans in warehousing are often part of more complex 
join operations. And just because something returns a large number of 
result rows doesn't mean that the input data was that much.

As for the definition of "large" itself, this depends on the size of the 
buffer cache and the access pattern of the application. As you surely 
have noticed, the usual sizes of B1+T1 = T2+B2 = C in the algorithm. 
Buffers evicted from T1 are remembered in B1, and because of that even 
repeated sequential scans of the same large relation will only cycle 
through T1 blocks, never cause any turbulence in T2 or B2.

The only thing that will affect T2 and B2 dramatically by adjusting the 
cache split point is multiple scanning of more than one significantly 
large but smaller than C table. Scanning the same large but smaller C 
table over and over will have it after the second scan in T2, where it 
belongs. But having two tables A and B that are both just smaller C and 
having an access pattern like A, A, B, B, A, A, ... will cause many B1 
hits and thereby increase the target T1 size. And it must be exactly 
that access pattern, because A, A, A, B, B, B, A, A, A, ... produces a 
complete MISS on the first, a B1 hit on the second and a B2 hit on the 
third scan, so it will up and down the split point evenly.

Honestly, I don't even know what type of application could possibly 
produce such a screwed access pattern. And I am absolutely confident one 
can find corner cases to wring down Oracles complicated configuration 
harness more easily.

>  
> Anyway, this idea can wait at least until we have extensive performance
> tuning on DBT-3 with 7.5. Thanks again for adding the new algorithm.

Everyone is always welcome to try and show that something can be 
improved. And we are in the middle of the 7.5 development cycle, so feel 
free to hack around.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



pgsql-hackers by date:

Previous
From: "Simon Riggs"
Date:
Subject: Re: cache control?
Next
From: ohp@pyrenet.fr
Date:
Subject: unitialized page on 7.3.5