Re: Further reduction of bufmgr lock contention - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Further reduction of bufmgr lock contention
Date
Msg-id 200608260323.k7Q3NZZ02977@momjian.us
Whole thread Raw
In response to Re: Further reduction of bufmgr lock contention  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Further reduction of bufmgr lock contention
List pgsql-hackers
Is this being kept for 8.3?

---------------------------------------------------------------------------

Tom Lane wrote:
> A couple months ago we were discussing partitioning the buffer mapping
> table so as to reduce contention for the BufMappingLock.  The discussion
> stalled after I complained about the uncertainty of shared memory usage
> arising from having a separate hashtable for each partition (since
> different numbers of buffers might fall into each partition at different
> times).  I've been thinking about it more after seeing an example from
> Tatsuo that seems to be exactly the same problem as Gavin saw.
> 
> We could fix the uncertain-usage objection if we stick with a single
> hashtable and ensure that concurrent access to different partitions of
> it is safe.  I believe this is easily doable, if we make hashtables
> used in this mode allocate the maximum allowed number of buckets
> immediately during hashtable initialization.  (Since shared hashtables
> already have a fixed maximum directory size, they already have an upper
> bound on the number of buckets, so this loses no flexibility.)  Then
> there will be no on-the-fly bucket splits, and that means that accesses
> to different hash buckets operate completely independently.  Therefore,
> we can regard the hashtable as logically partitioned on the basis of any
> classification of entries that will uniquely assign hash buckets to
> classes --- taking the low order bits of entry hash codes will do fine.
> 
> The only changeable state that is shared across all buckets is the entry
> freelist and the "nentries" counter.  We could protect these with a
> spinlock (one spinlock is sufficient since changes to nentries go along
> with addition or removal of freelist entries).
> 
> Usage of a partitioned hash table would then be like
> 
>     compute hashtable lookup key;
>     entryhashcode = calc_hash(lookup key);
>     partitionnumber = entryhashcode % NumPartitions;
>     LWLockAcquire(PartitionLock[partitionnumber], ...);
>     manipulate hashtable;
>     LWLockRelease(PartitionLock[partitionnumber]);
> 
> We could do this without changing the API of hash_search, but then we'd
> be computing the entry hashcode twice, so I'm inclined to provide an
> additional entry point that takes a precalculated hashcode.
> 
> Potential downsides of applying this idea to the buffer mapping table:
> 
> 1. Reassigning a buffer to a new page will (usually) require two cycles
> of LWLockAcquire/Release for the two different partitions involved.
> Since this path also requires at least a read() kernel call (maybe a
> write() too), I don't think there'll be any meaningful slowdown.
> 
> 2. The current logic for reassigning a buffer attempts to make a
> hashtable entry for its new page number (to detect collisions) before
> releasing the old hashtable entry.  This would only be safe if we held
> both partition LWLocks concurrently; which seems bad for concurrency,
> plus avoiding deadlock would complicate the code significantly.  I'm
> inclined to release the old entry and then try to insert the new one,
> holding only one lock at a time.  If the insertion fails (because
> someone was concurrently loading the same page), we'd have to throw the
> buffer onto the freelist rather than allowing it to retain its previous
> valid data.  Which is slightly annoying, but in practice the case
> probably doesn't happen often enough to be worth worrying about.
> 
> 3. Taking the freelist spinlock is new computation that wasn't there
> before.  But, again, it's only needed in code paths that will also be
> doing a kernel call.
> 
> If we do this we should probably also handle the lmgr lock tables the
> same way (partially reverting my partition-the-LockMgrLock patch of a
> couple months ago).  However, downside #3 might be a stronger objection
> for lmgr, since it can create or destroy lock objects without necessarily
> doing any I/O.
> 
> Thoughts, objections?
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Euler Taveira de Oliveira
Date:
Subject: Re: New XML section for documentation
Next
From: Bruce Momjian
Date:
Subject: Re: VALUES clause memory optimization