On Thu, Apr 17, 2014 at 10:40:40AM -0400, Robert Haas wrote:
> On Thu, Apr 17, 2014 at 10:32 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Thu, Apr 17, 2014 at 10:18:43AM -0400, Robert Haas wrote:
> >> I also believe this to be the case on first principles and my own
> >> experiments. Suppose you have a workload that fits inside
> >> shared_buffers. All of the usage counts will converge to 5. Then,
> >> somebody accesses a table that is not cached, so something's got to be
> >> evicted. Because all the usage counts are the same, the eviction at
> >> this point is completely indiscriminate. We're just as likely to kick
> >> out a btree root page or a visibility map page as we are to kick out a
> >> random heap page, even though the former have probably been accessed
> >> several orders of magnitude more often. That's clearly bad. On
> >> systems that are not too heavily loaded it doesn't matter too much
> >> because we just fault the page right back in from the OS pagecache.
> >> But I've done pgbench runs where such decisions lead to long stalls,
> >> because the page has to be brought back in from disk, and there's a
> >> long I/O queue; or maybe just because the kernel thinks PostgreSQL is
> >> issuing too many I/O requests and makes some of them wait to cool
> >> things down.
> >
> > I understand now. If there is no memory pressure, every buffer gets the
> > max usage count, and when a new buffer comes in, it isn't the max so it
> > is swiftly removed until the clock sweep has time to decrement the old
> > buffers. Decaying buffers when there is no memory pressure creates
> > additional overhead and gets into timing issues of when to decay.
>
> That can happen, but the real problem I was trying to get at is that
> when all the buffers get up to max usage count, they all appear
> equally important. But in reality they're not. So when we do start
> evicting those long-resident buffers, it's essentially random which
> one we kick out.
True. Ideally we would have some way to know that _all_ the buffers had
reached the maximum and kick off a sweep to decrement them all. I am
unclear how we would do that. One odd idea would be to have a global
counter that is incremented everytime a buffer goes from 4 to 5 (max)
--- when the counter equals 50% of all buffers, do a clock sweep. Of
course, then the counter becomes a bottleneck.
-- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB
http://enterprisedb.com
+ Everyone has their own god. +