Re: Thinking about breaking up the BufMgrLock - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Thinking about breaking up the BufMgrLock
Date
Msg-id KGEFLMPJFBNNLNOOOPLGEEHLCIAA.simon@2ndquadrant.com
Whole thread Raw
In response to Thinking about breaking up the BufMgrLock  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Thinking about breaking up the BufMgrLock  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
>Tom Lane
> But without a smart idea about data structures I don't see how to do
> better.
>
> Thoughts?
>

Hmm, seems like you summed up the lack of algorithmic choices very well.

After much thought, I believe the best way is to implement bufferpools
(BPs). That is, we don't just have one bufferhash and one LRU, we have
many pairs. We then work out some mapping by which a block can be known
to be in a particular BP, then acquire the lock for that BP.

Using BPs means that each block is only ever associated with one BP.
Hence, there is still only a single BPBufMgrLock to Acquire/Release, but
there could be considerably less contention for a specific lock.

BPs can be implemented manually or automatically. Commercial systems
provide facilities for manual placement of specific tables into specific
bufferpools. That is more work, so I would favour automatically assigned
BPs, though leaving the door open to further extension. It seems simple
to use one or more of the numbers associated with a block as input to a
hash algorithm to identify the BP in which to locate the block. After
some thinking, using relationId seems to be the best candidate. That
would mean an index or sequential scan would be more likely to reAcquire
a lock.

There is some possibility that the multiple BP LRUs could become
unbalanced, so very short LRUs would not benefit as much from splitting
up in this way. So, I suggest linking the number of BPs to the number of
shared buffers, say 1 BP per 10,000 shared_buffers, up to a maximum of
8.

I believe a self-balancing algorithm between the multiple lists would be
fairly straightforward addition also.

ISTM that there is no chance for deadlock in using BPs. The current code
assumes that all blocks are accessed via a single BufMgrLock. There is
never an attempt to acquire BufMgrLock when it is already held, so no
chance therefore of deadlock.

Overall, this idea would reduce BufMgrLock contention by as many
multiples as you choose, as well as allowing more practical use of
larger shared_buffers settings which would further drive up performance.
The costs need not be incurred at all for smaller shared_buffers and the
current situation (num_bufferpools = 1) can remain the default.

Also, the current code already allows for multiple lists, so extending
the number from 4 (or 3) to multiples seems more straightforward.

One other consideration is the eviction of dirty buffers, but more
detail on that in another post. What should be observed there is that
dirty buffer evictions perform Acquire/Release of the BufMgrLock at
least twice. Improving the bgwriter will also have an effect on
BufMgrLock contention.

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: "Simon Riggs"
Date:
Subject: Re: Thinking about breaking up the BufMgrLock
Next
From: "Simon Riggs"
Date:
Subject: Re: Inline MemoryContextSwitchTo?