Re: Design notes for BufMgrLock rewrite - Mailing list pgsql-hackers

From Jim C. Nasby
Subject Re: Design notes for BufMgrLock rewrite
Date
Msg-id 20050217053925.GV52357@decibel.org
Whole thread Raw
In response to Re: Design notes for BufMgrLock rewrite  (Kenneth Marshall <ktm@it.is.rice.edu>)
List pgsql-hackers
On Wed, Feb 16, 2005 at 11:42:11AM -0600, Kenneth Marshall wrote:
> I have seen this algorithm described as a more generalized clock type
> algorithm. As the size of the counter increases, up to the number of
> buffers, the clock algorithm becomes LRU. One bit is the lightest
> weight approximation. Increasing the number of bits or a count makes
> the clock algorithm more closely approximate LRU. You need to balance
> how long it takes to find a free buffer. That time increases as the
> count size increases.

Yeah, the trick to this seems to be how you tweak the rate at which
stuff 'falls out' of the active list. I can think of 3 ways to
accomplish this:

1) Change the maximum value to count to (or the value at which a buffer
is considered no longer in use).

This has the disadvantage of changing how effective the count is. In an
extreme situation, it would retuce back to a single bit. It also won't
affect buffers that already have higher counts, meaning older data is
more likely to stay in buffer than newer data.

2) Change the amount the counter is incremented by on each use (and/or
the amount it's decremented by).

An example of this might be having the clock decrement by 10. Under a
light to medium load, the system might increment by 10 on each use, but
if the system starts getting pressed for free buffers, that could be
reduced.

A downside of this would be that it potentially requires more space in
each header to store a larger value. An advatage is that it allows more
flexability than #1. For example, if the decrement value is increased in
order to speed up reclaiming of buffers, it won't create a difference in
how buffers are weighted based on when they were accessed like #1 will.

3) Change the speed of the clock.

This is what BSD effectively does. The OS periodically checks to see how
many pages are available on the free list, as well as how many were
removed since the last check. This information is used to decide how
many pages the clock algorithm should attempt to free in the next time
period (which can be 0).

If a two-hand algorithm is used, the distance between the two hands can
also be varied.

I think #3 probably means you'd need a seperate process to handle the
clock and moving buffers to a free list. Or perhaps this gets tied in
with the background writer. This might mean more overhead, but it could
improve contention if it means only one process needs to aquire some of
these locks.

So much for a simple design discussion. :) Fortunately, #1 and #2 should
be easy to test. #3 will certainly require more code, but it would
probably be simpler to implement than having multiple backends running
the clock algorithm (which I think is the current plan).

Something else I thought of; by using a counter instead of a bit, you
can also 'pre-seed' buffers based on why they were populated. For
example, pages brought in from an index might start with a value of 4;
heap pages 3, heap pages from a seqscan 2, and pages from vacuum, 1, or
maybe even 0.
-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Help me recovering data
Next
From: Dennis Bjorklund
Date:
Subject: Re: Help me recovering data