Re: [HACKERS] Clock with Adaptive Replacement - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: [HACKERS] Clock with Adaptive Replacement |
Date | |
Msg-id | CAEepm=1hFjHs=pzFwvAFP4wfiF8+33LxuXwigugXDY3kpVy_Sg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Clock with Adaptive Replacement (Andrey Borodin <x4mmm@yandex-team.ru>) |
Responses |
Re: [HACKERS] Clock with Adaptive Replacement
|
List | pgsql-hackers |
On Tue, Apr 24, 2018 at 5:04 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote: > Here's what we have about size of shared buffer now [1] (taken from [2]). I believe right hill must be big enough to reducecentral pit to zero and make function monotonic: OS page cache knows less about data blocks and is expected to be lessefficient. > [1] https://4.bp.blogspot.com/-_Zz6X-e9_ok/WlaIvpStBmI/AAAAAAAAAA4/E1NwV-_82-oS5KfmyjoOff_IxUXiO96WwCLcBGAs/s1600/20180110-PTI.png Interesting curve. Can you explain it? Here is my guess, assuming that pgbench is generating uniformly distributed random access, and assuming 26GB dataset and 32GB physical RAM: 1. In the range 128MB to 6GB, only a small fraction of the dataset fits into shared_buffers, but it doesn't matter much: you get a 100% hit ratio from the kernel's page cache. 2. In the range 6GB to 16GB, the dataset doesn't fit in shared_buffer OR the kernel's page cache, so you begin to risk misses in both caches. Every byte you give to one you take away from the other, but it might be worse than zero sum: at the lowest point in your graph, x = 16GB, I suppose you have a ~61% hit ratio in shared buffers (16GB/26GB), but when you miss you're also likely to miss in the OS page cache too because that probably holds the most recently read 16GB (= other half of physical RAM) of data... in other words the same data, assuming uniform random access. (If it were not random, then some pages would have a greater chance of staying in PG's cache and falling out of the kernel's cache, but essentially random replacement prevents that (?)) 3. In the range 16GB to 26GB, the shared_buffers hit ratio increases from 50% to 100%. I guess if you had 48GB of RAM the dip between 6GB and 26GB wouldn't happen? Am I close? > I'm not sure CART is the best possibility, though. I don't know either, but LRU and ARC don't seem promising for various reasons. Aside from the scan resistance claim, it sounds like CART should avoid long searches for a buffer. In my primitive attempts to understand that a bit better, I tried instrumenting freelist.c to tell dtrace how far it had to move the clock hand to find a buffer, and I got histograms like this (this example from a 1GB dataset, 512MB shared buffers, pgbench select-only, and the most frequent numbers were in the 4-7 bucket but there were 3 freak cases in the 16384-32767 bucket): value ------------- Distribution ------------- count -1 | 0 0 |@@@@ 65535 1 |@@@@@ 89190 2 |@@@@@@@@@ 139114 4 |@@@@@@@@@@@ 170881 8 |@@@@@@@@ 132579 16 |@@@ 47731 32 | 5530 64 | 137 128 | 1 256 | 0 512 | 0 1024 | 0 2048 | 0 4096 | 0 8192 | 1 16384 | 3 32768 | 0 You can see a just a few outliers there. With smaller shared buffers the counts are bigger but the average hand movement distance gets smaller (here 128MB shared buffers): value ------------- Distribution ------------- count -1 | 0 0 | 16383 1 |@@@@@@@@@@@@@@@ 1289265 2 |@@@@@@@@@@@@@@@ 1270531 4 |@@@@@@@@ 659389 8 |@ 111091 16 | 2815 32 | 19 64 | 1 128 | 0 256 | 0 512 | 0 1024 | 0 2048 | 2 4096 | 2 8192 | 0 I think pgbench isn't a very interesting workload though. I don't have time right now but it would be fun to dig further and try to construct workloads that generate pathological searches for buffers (I believe that such workloads exist, from anecdotal reports). -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: