Re: The science of optimization in practical terms? - Mailing list pgsql-hackers

From Greg Smith
Subject Re: The science of optimization in practical terms?
Date
Msg-id Pine.GSO.4.64.0902151238070.1312@westnet.com
Whole thread Raw
In response to Re: The science of optimization in practical terms?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: The science of optimization in practical terms?  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Re: The science of optimization in practical terms?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, 13 Feb 2009, Robert Haas wrote:

> Gather statistics on relation access patterns and use that to estimate 
> the fraction of a relation likely to be in cache.

At one point I had a hacked background writer that collected statistics 
about the contents of the buffer cache.  Since it's obtaining a lock on 
the buffer header anyway, it's a perfectly good place to note what 
relfileid the buffer is associated with.  If you set aside some fixed 
amount of space to hold information about the most popular relations 
(perhaps using a continuous top-k model, see 
http://www.mysmu.edu/faculty/kyriakos/topk-SIGMOD06.pdf ), you can end up 
with enough data to estimate how much data in shared_buffers exists for 
the most cached relations in there.

In a typical recommended tuning nowadays, we can only expect that to 
sample about 1/3 of the total caching happening (presuming 
shared_buffers=1/4 RAM and effective_cache_size~=3/4 RAM).  While in 
general it's nice to think that shared_buffers has a similar makeup to 
what the OS is caching, it's not hard to discover common cases where this 
would not be the case.  Particularly given the VACUUM/seq scan ring-buffer 
improvements in 8.3, it's easy to imagine scanning a table that's 
2*shared_buffers in size showing only 256KB in shared_buffers, while the 
whole thing is available in the OS cache.

I had a eureka moment where I realized I could hook the buffer eviction 
code to model that.  Decrement the count for that relation in the main 
top-k count, then have a second count that assumes the last 
2*shared_buffers evicted are also still cached.  That would accurately 
model the ring-buffer case and improve the quality of the model in 
general.  Updating those stats on every eviction would add some overhead, 
but if the background writer is doing enough of them for you that should 
at least be asynchronous from when most backends are blocked waiting for 
an eviction.

And that's as far as I got before I had to return to real work again.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.
Next
From: "Kevin Grittner"
Date:
Subject: Re: The science of optimization in practical terms?