Home > mailing lists

Re: Reducing the size of BufferTag & remodeling forks - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: Reducing the size of BufferTag & remodeling forks
Date	September 13, 2015 13:16:38
Msg-id	CANP8+jLT+zQdj=6T2fsBNFj_wvqDdPvJ7fkTg++_RcOx5QoCEw@mail.gmail.com Whole thread Raw
In response to	Re: Reducing the size of BufferTag & remodeling forks (Andres Freund <andres@anarazel.de>)
List	pgsql-hackers

Tree view

On 12 September 2015 at 21:19, Andres Freund <andres@anarazel.de> wrote:

On 2015-09-12 13:12:26 +0100, Simon Riggs wrote:
> Why do we have to do buffer lookups using the full buffer tag?

We don't necessarily.

> Why not just use (relNode, blockNum) and resolve hash collisions, if any?

I tried that and unfortunately it came out as a negative - the number of
collision gets higher and collisions in a hashtable will often imply a
cache miss. You can even see the comparison function rising in the
profile.

I've actually changed course a bit and I'm trying something different: A
two level structure. One hashtable that maps (RelFileNode, ForkNumber)
to a 'open relation' data structure, and from there a radix tree over
just the block number. To avoid having to look up in the hashtable
frequently there's a pointer in RelationData to a fork's radix tree.

This seems to have several advantages:
* It doesn't require changes to the physical representation

Seems necessary

* A radix tree allows ordered traversal, allowing for efficient
prefetching et al.

Could be very important... some benchmarks of whether that was worthwhile would really help

* The ability to efficiently get all pages that belong to a relation
makes dropping/truncating tables radically more efficient than now.

Not very important, since there are other approaches

* A single lookup in the radix tree is on average significantly faster
than in the hash table. I've measured ~350 vs 60 cycles in a
nonconcurrent workload and ~820 vs 72~ cycles in a concurrent
workload, using a recorded buffer access traces.

Good

* Having a 'open relation' representation in shared memory also makes it
much easier to implement more efficient relation extension and
truncation - we can have pointers in there that signal how far the
relation has been extended and how far it has been initialized.

Probably important, for extension

The biggest disadvantage is that it's a good bit more complex than what
we have right now. That we need dynamically many radix trees makes
memory management nontrivial.

Which might mean we lose benefit by requiring many additional memory accesses.

Sounds halfway sane?

I think it is a good research direction, as long as we get the focus right.

Another similar approach is direct allocation of chunks of memory for in-memory access. That would be much less flexible, but access would be in <5 cycles. I mention this for comparison only.

So if the focus is on more efficient and reasonably flexible access to data in memory, then it sounds good.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Stephen Frost
Date: 13 September 2015, 12:49:39
Subject: Re: [DOCS] Missing COMMENT ON POLICY

From: Michael Paquier
Date: 13 September 2015, 13:17:33
Subject: Re: New gist vacuum.

Re: Reducing the size of BufferTag & remodeling forks - Mailing list pgsql-hackers

Previous

Next