Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Fast insertion indexes: why no developments
Date
Msg-id CA+U5nMLw2-moSZu0bO5JQrRzjNi1axNvC4sF2eQpqSNoVJ5NEw@mail.gmail.com
Whole thread Raw
In response to Re: Fast insertion indexes: why no developments  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Fast insertion indexes: why no developments  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-hackers
On 4 November 2013 16:09, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Nov 2, 2013 at 6:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 29 October 2013 16:10, Peter Geoghegan <pg@heroku.com> wrote:
>>> On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>>>> I don't see much interest in insert-efficient indexes.
>>>
>>> Presumably someone will get around to implementing a btree index
>>> insertion buffer one day. I think that would be a particularly
>>> compelling optimization for us, because we could avoid ever inserting
>>> index tuples that are already dead when the deferred insertion
>>> actually occurs.
>>
>> That's pretty much what the LSM-tree is.
>
> What is pretty cool about this sort of thing is that there's no
> intrinsic reason the insertion buffer needs to be block-structured or
> disk-backed.  In theory, you can structure the in-memory portion of
> the tree any way you like, using pointers and arbitrary-size memory
> allocations and all that fun stuff.  You need to log that there's a
> deferred insert (or commit to flushing the insertion buffer before
> every commit, which would seem to miss the point) so that recovery can
> reconstruct the in-memory data structure and flush it, but that's it:
> the WAL format need not know any other details of the in-memory
> portion of the tree.  I think that, plus the ability to use pointers
> and so forth, might lead to significant performance gains.

Sounds like it could be cool, yes.

> In practice, the topology of our shared memory segment makes this a
> bit tricky.  The problem isn't so much that it's fixed size as that it
> lacks a real allocator, and that all the space used for shared_buffers
> is nailed down and can't be borrowed for other purposes.  I'm very
> interested in solving the problem of getting a real allocator for
> shared memory because I think I need it for parallel sort

Agreed, you need a shared memory allocator for parallel query. It's
just too tempting to build a hash table in shared memory on one
thread, then use the hash table from multiple sessions as we do a
parallel scan. Roughly same thing for parallel sort - passing pointers
to complete data objects around is much easier than actually moving
the data between processes, which would slow things down. We do also
need generalised inter-node pipe but that's not the best solution to
every problem.

It's also a great way of controlling over-allocation of resources.

> , and even if
> there's a way to avoid needing it there I have a strong feeling that
> we'll want it for other applications of dynamic shared memory.  But it
> should be possible to write the allocator in such a way that you just
> give it a chunk of memory to manage, and it does, remaining agnostic
> about whether the memory is from the main shared memory segment or a
> dynamic one.

Agreed

> Of course, it's possible that even we do get a shared memory
> allocator, a hypothetical person working on this project might prefer
> to make the data block-structured anyway and steal storage from
> shared_buffers.  So my aspirations in this area may not even be
> relevant.  But I wanted to mention them, just in case anyone else is
> thinking about similar things, so that we can potentially coordinate.

If anyone was going to work on LSM tree, I would advise building a
tree in shared/temp buffers first, then merging with the main tree.
The merge process could use the killed tuple approach to mark the
merging.

The most difficult thing about buffering the inserts is deciding which
poor sucker gets the task of cleaning up. That's probably better as an
off-line process, which is where the work comes in. Non shared
buffered approaches would add too much overhead to the main task.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Gavin Flower
Date:
Subject: Re: Fast insertion indexes: why no developments
Next
From: Jeff Janes
Date:
Subject: Re: Fast insertion indexes: why no developments