Re: drop/truncate table sucks for large values of shared buffers - Mailing list pgsql-hackers

From Andres Freund
Subject Re: drop/truncate table sucks for large values of shared buffers
Date
Msg-id 20150628140113.GK4797@alap3.anarazel.de
Whole thread Raw
In response to Re: drop/truncate table sucks for large values of shared buffers  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2015-06-28 09:11:29 -0400, Robert Haas wrote:
> On Sat, Jun 27, 2015 at 11:38 AM, Andres Freund <andres@anarazel.de> wrote:
> > I've started to play around with doing that a year or three back. My
> > approach was to use a linux style radix tree for the buffer mapping
> > table.  Besides lack of time what made it hard to be efficient was the
> > size of our buffer tags requiring rather deep trees.
> >
> > I think I was considering playing around with a two-level tree (so we
> > could cache a pointer in Relation or such), but the memory management
> > requirements for that made my head hurt too much. The other alternative
> > is to work on having a much simpler buffer tag
> 
> Wouldn't even a two-level tree have the same problem you complained
> about vis-a-vis chash?

I was hoping to avoid the upper tree (mapping relfilenodes to the block
tree) in the majority of the cases by caching that mapping in struct
Relation or so.

But generally, yes, a tree will have more indirections. But they're
often of a different quality than with a hash table. There's a high
amount of spatial locality when looking up blocks: You're much more
likely to lookup a block close to one recently looked up than just a
randomly different one. Hashtables don't have a way to benefit from that
- tree structures sometimes do.

I don't think using a radix tree in itself will have significant
performance benefits over the hash table. But it allows for a bunch of
cool further optimizations like the aforementioned 'intelligent'
readahead, combining writes when flushing buffers, and - which made me
look into it originally - tagging inner nodes in the tree with
information about dirtyness to avoid scanning large amounts of nondirty
buffers.

> In that case, you were of the opinion that even ONE extra level of
> indirection was enough to pinch.  (I'm still hoping there is a way to
> fix that, but even so.)

Me too.


Andres



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: drop/truncate table sucks for large values of shared buffers
Next
From: Tomas Vondra
Date:
Subject: Re: proposal: condition blocks in psql