Re: Page replacement algorithm in buffer cache - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Page replacement algorithm in buffer cache
Date
Msg-id 515A169C.7070406@nasby.net
Whole thread Raw
In response to Re: Page replacement algorithm in buffer cache  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
On 3/24/13 8:11 AM, Greg Smith wrote:
> On 3/22/13 8:45 AM, Ants Aasma wrote:
>> However, I think the main issue isn't finding new algorithms that are
>> better in some specific circumstances. The hard part is figuring out
>> whether their performance is better in general.
>
> Right.  The current page replacement method works as expected.  Many frequently accessed pages accumulate a usage
countof 5 before the clock sweep hits them.  Pages that are accessed once and not again before the clock sweep are
evicted. There are several theoretically better ways to approach this.  Anyone who hasn't already been working on this
fora few years is very unlikely to come up with a brand new idea, one that hasn't already been tested in the academic
research.
>
> But the real blocker here isn't ideas, it's creating benchmark workloads to validate any change.  Right now I see the
mostpromising work that could lead toward the "performance farm" idea as all of the Jenkins based testing that's been
goingon recently.  Craig Ringer has using that for 2ndQuadrant work here, Peter Eisentraut has been working with it:
http://petereisentraut.blogspot.com/2013/01/postgresql-and-jenkins.htmland the PostGIS project uses it too.  There's
somegood momentum brewing there.
 
>
> When we have regular performance testing with a mix of workloads--I have about 10 in mind to start--at that point we
couldstart the testing performance changes to the buffer replacement.  Until then this whole area is hard to touch
usefully. You have to assume that any tuning you do for one type of workload might accidentally slow another.  Starting
witha lot of baseline workloads is the only way to move usefully forward when facing that problem.
 

The other thing I think would be tremendously useful would be the ability to get performance data from systems in the
field*without having to install extra stuff or do a special build*. The last point is critical because there are so
manyplaces where deviating from a standard package takes an act of Congress.
 

In this case, if I could run some queries to get stats about clock sweep waits and what-not then I could get our shared
buffersize changed on some hosts and see how those changes affect the numbers. But doing this with a non-standard build
ispretty much a non-starter.
 

I know there's been some improvement in this area, but I suspect there's still more to go.
-- 
Jim C. Nasby, Data Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Page replacement algorithm in buffer cache
Next
From: Robert Haas
Date:
Subject: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)