On 12 September 2015 at 21:19, Andres Freund <andres@anarazel.de> wrote:
On 2015-09-12 13:12:26 +0100, Simon Riggs wrote: > Why do we have to do buffer lookups using the full buffer tag?
We don't necessarily. > Why not just use (relNode, blockNum) and resolve hash collisions, if any?
I tried that and unfortunately it came out as a negative - the number of collision gets higher and collisions in a hashtable will often imply a cache miss. You can even see the comparison function rising in the profile.
I've actually changed course a bit and I'm trying something different: A two level structure. One hashtable that maps (RelFileNode, ForkNumber) to a 'open relation' data structure, and from there a radix tree over just the block number. To avoid having to look up in the hashtable frequently there's a pointer in RelationData to a fork's radix tree.
This seems to have several advantages: * It doesn't require changes to the physical representation
Seems necessary
* A radix tree allows ordered traversal, allowing for efficient prefetching et al.
Could be very important... some benchmarks of whether that was worthwhile would really help
* The ability to efficiently get all pages that belong to a relation makes dropping/truncating tables radically more efficient than now.
Not very important, since there are other approaches
* A single lookup in the radix tree is on average significantly faster than in the hash table. I've measured ~350 vs 60 cycles in a nonconcurrent workload and ~820 vs 72~ cycles in a concurrent workload, using a recorded buffer access traces.
Good
* Having a 'open relation' representation in shared memory also makes it much easier to implement more efficient relation extension and truncation - we can have pointers in there that signal how far the relation has been extended and how far it has been initialized.
Probably important, for extension
The biggest disadvantage is that it's a good bit more complex than what we have right now. That we need dynamically many radix trees makes memory management nontrivial.
Which might mean we lose benefit by requiring many additional memory accesses.
Sounds halfway sane?
I think it is a good research direction, as long as we get the focus right.
Another similar approach is direct allocation of chunks of memory for in-memory access. That would be much less flexible, but access would be in <5 cycles. I mention this for comparison only.
So if the focus is on more efficient and reasonably flexible access to data in memory, then it sounds good.
--
Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services