Re: Further reduction of bufmgr lock contention - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Further reduction of bufmgr lock contention |
Date | |
Msg-id | 200608260323.k7Q3NZZ02977@momjian.us Whole thread Raw |
In response to | Re: Further reduction of bufmgr lock contention (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Further reduction of bufmgr lock contention
|
List | pgsql-hackers |
Is this being kept for 8.3? --------------------------------------------------------------------------- Tom Lane wrote: > A couple months ago we were discussing partitioning the buffer mapping > table so as to reduce contention for the BufMappingLock. The discussion > stalled after I complained about the uncertainty of shared memory usage > arising from having a separate hashtable for each partition (since > different numbers of buffers might fall into each partition at different > times). I've been thinking about it more after seeing an example from > Tatsuo that seems to be exactly the same problem as Gavin saw. > > We could fix the uncertain-usage objection if we stick with a single > hashtable and ensure that concurrent access to different partitions of > it is safe. I believe this is easily doable, if we make hashtables > used in this mode allocate the maximum allowed number of buckets > immediately during hashtable initialization. (Since shared hashtables > already have a fixed maximum directory size, they already have an upper > bound on the number of buckets, so this loses no flexibility.) Then > there will be no on-the-fly bucket splits, and that means that accesses > to different hash buckets operate completely independently. Therefore, > we can regard the hashtable as logically partitioned on the basis of any > classification of entries that will uniquely assign hash buckets to > classes --- taking the low order bits of entry hash codes will do fine. > > The only changeable state that is shared across all buckets is the entry > freelist and the "nentries" counter. We could protect these with a > spinlock (one spinlock is sufficient since changes to nentries go along > with addition or removal of freelist entries). > > Usage of a partitioned hash table would then be like > > compute hashtable lookup key; > entryhashcode = calc_hash(lookup key); > partitionnumber = entryhashcode % NumPartitions; > LWLockAcquire(PartitionLock[partitionnumber], ...); > manipulate hashtable; > LWLockRelease(PartitionLock[partitionnumber]); > > We could do this without changing the API of hash_search, but then we'd > be computing the entry hashcode twice, so I'm inclined to provide an > additional entry point that takes a precalculated hashcode. > > Potential downsides of applying this idea to the buffer mapping table: > > 1. Reassigning a buffer to a new page will (usually) require two cycles > of LWLockAcquire/Release for the two different partitions involved. > Since this path also requires at least a read() kernel call (maybe a > write() too), I don't think there'll be any meaningful slowdown. > > 2. The current logic for reassigning a buffer attempts to make a > hashtable entry for its new page number (to detect collisions) before > releasing the old hashtable entry. This would only be safe if we held > both partition LWLocks concurrently; which seems bad for concurrency, > plus avoiding deadlock would complicate the code significantly. I'm > inclined to release the old entry and then try to insert the new one, > holding only one lock at a time. If the insertion fails (because > someone was concurrently loading the same page), we'd have to throw the > buffer onto the freelist rather than allowing it to retain its previous > valid data. Which is slightly annoying, but in practice the case > probably doesn't happen often enough to be worth worrying about. > > 3. Taking the freelist spinlock is new computation that wasn't there > before. But, again, it's only needed in code paths that will also be > doing a kernel call. > > If we do this we should probably also handle the lmgr lock tables the > same way (partially reverting my partition-the-LockMgrLock patch of a > couple months ago). However, downside #3 might be a stronger objection > for lmgr, since it can create or destroy lock objects without necessarily > doing any I/O. > > Thoughts, objections? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: