Re: [HACKERS] Clock with Adaptive Replacement - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: [HACKERS] Clock with Adaptive Replacement |
Date | |
Msg-id | CAH2-Wz=1ejgxyqn1PoJ=A2NvUaL+1h+Y7F7NLSev=YuBTFMHtQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Clock with Adaptive Replacement (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: [HACKERS] Clock with Adaptive Replacement
Re: [HACKERS] Clock with Adaptive Replacement |
List | pgsql-hackers |
On Wed, Apr 25, 2018 at 5:26 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I think pgbench isn't a very interesting workload though. I don't > have time right now but it would be fun to dig further and try to > construct workloads that generate pathological searches for buffers (I > believe that such workloads exist, from anecdotal reports). While pgbench might not be the most interesting workload, I think even its super-synthetic, uniformly distributed point lookups can benefit from better caching decisions. The LRU-K paper's example 1.1 explains why this might be very cogently: Point lookups from TPC-A (or TPC-C) consist of descending the index, and then accessing the record in the main table from a TID, which actually implies quite a non-uniform access pattern for individual pages. Despite accessing table records uniformly. There are a negligible number of internal pages involved with pgbench (and they should always be cached), so they can almost be ignored. It's really just a matter of what proportion of shared_buffers should be leaf pages versus heap pages. The pgbench accounts table is approximately 6x the size of its primary key, so the leaf blocks have 6x the utility of heap blocks for this workload. It actually is pretty much that simple. Or it would be, if we didn't have the OS cache as a confounding factor. The optimal strategy for the pgbench workload as far as maximizing hits in shared_buffers goes is to *only* cache leaf blocks, at least until you have all leaf blocks cached; we should evict table blocks *immediately*. This isn't even remotely close to the behavior of an LRU, of course. Actually, as the LRU-K paper also points out, an LRU has a tendency to perversely somewhat *favor* the table blocks, since presumably each point lookup's table block is accessed after its leaf block is pinned/accessed. Before some of the big shared_buffers bottlenecks were alleviated several years ago, it was possible to observe shared_buffers evictions occurring essentially at random. I have no idea if that's still true, but it could be. A random replacement strategy is actually not as bad as it sounds, but it is still pretty bad, especially given that we could pay a high price in contention in return for such a bad strategy. This convinces me that we will not be able to assess any algorithm well in isolation, through simulations, because lock contention is too important overall, and can actually affect the cache hit ratio of a clock-based approach. I believe that it's generally true that our clock sweep algorithm approximates an LRU, but I have also seen it totally fail to do that, which it sounds like you've heard about, too. Andrey is probably correct in his suspicion that we'll end up prototyping a number of approaches. I'm glad that you're thinking about this, in any case. Seems like a promising area to work on. -- Peter Geoghegan
pgsql-hackers by date: