Re: Thinking about breaking up the BufMgrLock - Mailing list pgsql-hackers

From Kenneth Marshall
Subject Re: Thinking about breaking up the BufMgrLock
Date
Msg-id 20050207141428.GA2580@it.is.rice.edu
Whole thread Raw
In response to Thinking about breaking up the BufMgrLock  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Feb 06, 2005 at 07:30:37PM -0500, Tom Lane wrote:
> 
> ReadBuffer needs to do a lookup to map the page ID to a buffer ID,
> which in principle requires only a shared lock on the page-to-buffer
> mapping (embodied in the buf_table hash table).  Assuming success, it
> also needs to mark the buffer pinned and update the LRU-list position
> of the buffer.  Marking pinned is certainly a buffer-local change,
> so we could imagine splitting up the BufMgrLock like this:
> 
> 1. A global LWLock for the page-to-buffer mapping, call it say
> BufMappingLock.  Share lock on this is sufficient to allow reading the
> hash table; must get exclusive lock when reassigning any buffer's page
> identity.
> 
> 2. A global LWLock for the LRU strategy information, say BufLRULock
> or BufStrategyLock.
> 
> 3. Per-buffer LWLocks (or maybe just spinlocks) protecting the state of
> each buffer header; you need this lock to examine or change a buffer
> header.
> 
> ReleaseBuffer has no locking problems in this formulation: it just grabs
> the per-buffer lock, decrements its refcount, releases the lock.
> 
For the per-buffer, a latch would provide a lightweight method of updating
the contents of the buffer without hampering the read-only access. A latch
is comprised of a latch bit and a sequence number that can be set in an
atomic action. The flow for the two cases is simple:

Write:1. Get latch.2. Update the buffer.3. Increment the sequence number.4. Release the latch.

Read:1. Read version number.2. Read buffer.3. Check latch. If latched, go to 1.4. If version number has changed, go to
1.

By using this process, readers will only see a consistent state of
the buffer. Also, since the read does not entail a write operation
it will not cause a cache line update and contribute to the a cache
update "storm". The "get latch" operation can be implemented using
an atomic operation such as TAS (test-and-set) and CAS (compare-and-set).
This would provide readers an extremely lightweight access to the
buffer - no cache line update hit. If you need to have sequenced access
to the buffer, then you would need to use LWLocks but in many cases
such as 3. in Tom's list a latch would work well.

> ReadBuffer looks like:
> 
>     * Acquire share lock on BufMappingLock.
>     * Search hash table for desired ID.  (we assume success)
>     * acquire per-buffer lock.
>     * increment buffer refcount.
>     * release per-buffer lock.
>     * release share lock on BufMappingLock.
>     * update the LRU state.
> 
> (We have to bump the pin count on the target buffer before releasing the
> BufMappingLock, otherwise someone could reassign the buffer as soon as
> we release BufMappingLock.)
> 
> This would be pretty good from a locking point of view, except that
> "update the LRU state" seems to require taking an exclusive lock on a
> global data structure, which puts us about back where we were.
> Contention for a global BufLRULock might be a bit better than for the
> existing overall BufMgrLock, but it'll still cripple the performance
> of ReadBuffer.
> 
> Perhaps someone knows of a neat data structure that can maintain an LRU
> list with only local updates?  I don't though.
> 
The clock algorithm is pretty close to this and provides an approximation
to LRU that eleminates the need to move buffers to the MRU position by
using a reference bit.

> This would convert the existing "strict LRU" behavior into an
> "approximate LRU".  I'm worried that the change might be penny-wise and
> pound-foolish: if a poorer cache management algorithm causes us to have
> to do more I/O, it's probably a net loss to save some lock contention.
> But without a smart idea about data structures I don't see how to do
> better.
> 

One alternative to an approximate LRU, such as the clock algorithm, would
be to have multiple buffer pools as we discussed in the previous thread.
The contention would be reduced by 1/N, where N is the number of pools.
Of course, buffers should be allocated in a fashion that would maximize
locality and minimize the effect of scan cache polution.

More food for thought.

Ken


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: "external indices" ...
Next
From: Martin Pitt
Date:
Subject: Re: libpq API incompatibility between 7.4 and 8.0