Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Clock sweep not caching enough B-Tree leaf pages? |
Date | |
Msg-id | CA+TgmoZs6p3WZan8XisdL9s7pu6DxpX6cyHJBBeCVAKsM1FW7w@mail.gmail.com Whole thread Raw |
In response to | Re: Clock sweep not caching enough B-Tree leaf pages? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Clock sweep not caching enough B-Tree leaf pages?
Re: Clock sweep not caching enough B-Tree leaf pages? |
List | pgsql-hackers |
On Wed, Apr 16, 2014 at 2:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> Anyways, I'm still curious if you can post similar numbers basing the >> throttling on gross allocation counts instead of time. Meaning: some >> number of buffer allocations has to have occurred before you consider >> eviction. Besides being faster I think it's a better implementation: >> an intermittently loaded server will give more consistent behavior. > > Yeah --- I think wall-clock-based throttling is fundamentally the wrong > thing anyway. Are we going to start needing a CPU speed measurement to > tune the algorithm with? Not the place to be going. But driving it off > the number of allocations that've been done could be sensible. (OTOH, > that means you need a central counter, which itself would be a > bottleneck.) So, I was thinking about this a little bit more today, prodded by my coworker John Gorman. I'm wondering if we could drive this off of the clock sweep; that is, every time the clock sweep touches a buffer, its usage count goes down by one, but we also set two flag bits. Every time the buffer gets touched thereafter, we check whether any flag bits are set; if so, we clear one and increase the usage count, else we do nothing. So the usage count can increase at most twice per clock sweep. The advantage of that is that, as with Peter's approach, it is harder for the usage count of a buffer to max out - to get there, you need sustained access over a longer period of time. But it's not time-dependent, so replaying the same workload at twice the speed or half the speed on faster or slower hardware doesn't change the choice of which buffer to evict, which seems good. And it will cause us to prefer buffers which are accessed repeatedly over a period of time rather than buffers that are accessed a bunch of times in quick succession and then not touched again for a while, which seems like a good bet. I can't convince myself that this fully solves the (currently existing) problem of usage counts increasing too fast. In theory, every time the clock sweep hand advances, it decrements the usage count of one buffer by one, but also makes it possible for the usage count of that buffer to increase by up to two (or by only one if the buffer previously had a usage count of five). So with the right access pattern, it seems like you could still get to a place where a typical buffer eviction has to do a lot of scanning before it finds a victim buffer. As with the present algorithm, we'd be most likely to have that problem when the buffer hit ratio is very high, so that there are many opportunities for usage counts to go up between each opportunity to push usage counts back down. But it seems like it would at least limit the damage. I haven't tried this or anything, so this is just random brainstorming. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: