Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Clock sweep not caching enough B-Tree leaf pages?
Date
Msg-id 534C6916.7090205@nasby.net
Whole thread Raw
In response to Clock sweep not caching enough B-Tree leaf pages?  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Clock sweep not caching enough B-Tree leaf pages?
List pgsql-hackers
On 4/14/14, 12:11 PM, Peter Geoghegan wrote:
> I have some theories about the PostgreSQL buffer manager/clock sweep.
> To motivate the reader to get through the material presented here, I
> present up-front a benchmark of a proof-of-concept patch of mine:
>
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/3-sec-delay/
>
> Test Set 4 represents the patches performance here.
>
> This shows some considerable improvements for a tpc-b workload, with
> 15 minute runs, where the buffer manager struggles with moderately
> intense cache pressure. shared_buffers is 8GiB, with 32GiB of system
> memory in total. The scale factor is 5,000 here, so that puts the
> primary index of the accounts table at a size that makes it impossible
> to cache entirely within shared_buffers, by a margin of couple of
> GiBs. pgbench_accounts_pkey is ~"10GB", and pgbench_accounts is ~"63
> GB". Obviously the heap is much larger, since for that table heap
> tuples are several times the size of index tuples (the ratio here is
> probably well below the mean, if I can be permitted to make a vast
> generalization).
>
> PostgreSQL implements a clock sweep algorithm, which gets us something
> approaching an LRU for the buffer manager in trade-off for less
> contention on core structures. Buffers have a usage_count/"popularity"
> that currently saturates at 5 (BM_MAX_USAGE_COUNT). The classic CLOCK
> algorithm only has one bit for what approximates our "usage_count" (so
> it's either 0 or 1). I think that at its core CLOCK is an algorithm
> that has some very desirable properties that I am sure must be
> preserved. Actually, I think it's more accurate to say we use a
> variant of clock pro, a refinement of the original CLOCK.

I think it's important to mention that OS implementations (at least all I know of) have multiple page pools, each of
whichhas it's own clock. IIRC one of the arguments for us supporting a count>1 was we could get the benefits of
multiplepage pools without the overhead. In reality I believe that argument is false, because the clocks for each page
poolin an OS *run at different rates* based on system demands.
 

I don't know if multiple buffer pools would be good or bad for Postgres, but I do think it's important to remember this
differenceany time we look at what OSes do.
 

> If you look at the test sets that this patch covers (with all the
> tricks applied), there are pretty good figures throughout. You can
> kind of see the pain towards the end, but there are no dramatic falls
> in responsiveness for minutes at a time. There are latency spikes, but
> they're *far* shorter, and much better hidden. Without looking at
> individual multiple minute spikes, at the macro level (all client
> counts for all runs) average latency is about half of what is seen on
> master.

My guess would be that those latency spikes are caused by a need to run the clock for an extended period. IIRC there's
codefloating around that makes it possible to measure that.
 

I suspect it would be very interesting to see what happens if your patch is combined with the work that (Greg?) did to
reducethe odds of individual backends needing to run the clock. (I know part of that work looked at proactively keeping
pageson the free list, but I think there was more to it than that).
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: Excessive WAL generation and related performance issue
Next
From: Jim Nasby
Date:
Subject: Re: Excessive WAL generation and related performance issue