Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Clock sweep not caching enough B-Tree leaf pages? |
Date | |
Msg-id | CA+TgmobcJKTGb=XgwH90Wz9Jc4Sa+F4mknCKEcyHjevcNQfHew@mail.gmail.com Whole thread Raw |
In response to | Clock sweep not caching enough B-Tree leaf pages? (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Clock sweep not caching enough B-Tree leaf pages?
Re: Clock sweep not caching enough B-Tree leaf pages? Re: Clock sweep not caching enough B-Tree leaf pages? Re: Clock sweep not caching enough B-Tree leaf pages? |
List | pgsql-hackers |
On Mon, Apr 14, 2014 at 1:11 PM, Peter Geoghegan <pg@heroku.com> wrote: > In the past, various hackers have noted problems they've observed with > this scheme. A common pathology is to see frantic searching for a > victim buffer only to find all buffer usage_count values at 5. It may > take multiple revolutions of the clock hand before a victim buffer is > found, as usage_count is decremented for each and every buffer. Also, > BufFreelistLock contention is considered a serious bottleneck [1], > which is related. I think that the basic problem here is that usage counts increase when buffers are referenced, but they decrease when buffers are evicted, and those two things are not in any necessary way connected to each other. In particular, if no eviction is happening, reference counts will converge to the maximum value. I've read a few papers about algorithms that attempt to segregate the list of buffers into "hot" and "cold" lists, and an important property of such algorithms is that they mustn't be allowed to make everything hot. It's easy to be too simplistic, here: an algorithm that requires that no more than half the list be hot will fall over badly on a workload where the working set exceeds the available cache and the really hot portion of the working set is 60% of the available cache. So you need a more sophisticated algorithm than that. But that core property that not all buffers can be hot must somehow be preserved, and our algorithm doesn't. This isn't a fundamental property of the usage-count idea; it's an artifact of the fact that usage count decreases are tied to eviction pressure rather than access pressure. For example, suppose we made a rule that if the total usage counts of all buffers exceed 3 * NBuffers, then every time you bump the usage count of a buffer from N to N+1, you're required to advance the clock sweep far enough to decrease the reference count of a buffer by one. When you want to reclaiim a buffer, you advance a separate clock sweep until you find a buffer with a zero usage count; if you circle the whole ring without finding one, then you reclaim the buffer you saw with the lowest usage count. There are obvious scalability problems here (everyone fighting over the right to advance the clock sweep) but ignore that for the sake of the thought experiment: now you have an algorithm where not all buffers can be hot. If some buffers are hotter than others, then whenever their usage count is decreased it will immediately get pushed back up again, but some other buffer then has to absorb the decrease. Only the buffers that are really hot can maintain high usage counts, because *somebody* has to have a low usage count. Even ignoring scalability concerns, this might not be (and probably isn't) exactly what we want to implement, but I think it illustrates an important control principle all the same: buffer "cooling" needs to be driven by the same underlying phenomenon - probably buffer access - as buffer "heating". If they're driven by unrelated phenomena, then the rates may be wildly incomparable, and you'll end up with everything hot or everything cold. If that happens, you lose, because with everything the same, there's no principled way to decide which things are actually best to evict. If we come up with some good solution for shared buffers, we should also consider it applying it to SLRU eviction. I believe that the current situation around CLOG eviction is none too pretty. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: