Home > mailing lists

Re: Design notes for BufMgrLock rewrite - Mailing list pgsql-hackers

From	Jim C. Nasby
Subject	Re: Design notes for BufMgrLock rewrite
Date	February 16, 2005 20:21:06
Msg-id	20050216172009.GR52357@decibel.org Whole thread Raw
In response to	Re: Design notes for BufMgrLock rewrite (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Design notes for BufMgrLock rewrite
List	pgsql-hackers

Tree view

On Sun, Feb 13, 2005 at 06:56:47PM -0500, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> One thing I realized quickly is that there is no natural way in a clock
> >> algorithm to discourage VACUUM from blowing out the cache.  I came up
> >> with a slightly ugly idea that's described below.  Can anyone do better?
> 
> > Uh, is the clock algorithm also sequential-scan proof?  Is that
> > something that needs to be done too?
> 
> If you can think of a way.  I don't see any way to make the algorithm
> itself scan-proof, but if we modified the bufmgr API to tell ReadBuffer
> (or better ReleaseBuffer) that a request came from a seqscan, we could
> do the same thing as for VACUUM.  Whether that's good enough isn't
> clear --- for one thing it would kick up the contention for the
> BufFreelistLock, and for another it might mean *too* short a lifetime
> for blocks fetched by seqscan.

Is there anything (in the buffer headers?) that keeps track of buffer
access frequency? *BSD uses a mechanism to track roughly how often a page
in memory has been accessed, and uses that to determine what pages to
free. In 4.3BSD, a simple 2 hand clock sweep is used; the first hand
sets a not-used bit in each page, the second hand (which sweeps a fixed
distance behind the 1st hand) checks this bit and if it's still clear
moves the page either to the inactive list if it's dirty, or to the
cache list if it's clean. There is also a free list, which is generally
fed by the cache and inactive lists.

Postgresql has a big advantage over an OS though, in that it can
tolerate much more overhead in buffer access code than an OS can in it's
vm system. If I understand correctly, any use of a buffer at all means a
lock needs to be aquired on it's buffer header. As part of this access,
a counter could be incremented with very little additional cost. A
background process would then sweep through 'active' buffers,
decrementing this counter by some amount. Any buffer that was
decremented below 0 would be considered inactive, and a candidate for
being freed. The advantage of using a counter instead of a simple active
bit is that buffers that are (or have been) used heavily will be able to
go through several sweeps of the clock before being freed. Infrequently
used buffers (such as those from a vacuum or seq.  scan), would get
marked as inactive the first time they were hit by the clock hand.
-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

pgsql-hackers by date:

From: pgsql@mohawksoft.com
Date: 16 February 2005, 20:20:59
Subject: Re: Help me recovering data

From: pgsql@mohawksoft.com
Date: 16 February 2005, 20:27:09
Subject: Re: Help me recovering data

Re: Design notes for BufMgrLock rewrite - Mailing list pgsql-hackers

Previous

Next