Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Fast insertion indexes: why no developments
Date
Msg-id CAMkU=1wWsn0Y0Tw5P1Eyjr7dwfz3s2HmDsxFbeth4H2=jnZ7Gw@mail.gmail.com
Whole thread Raw
In response to Re: Fast insertion indexes: why no developments  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, Nov 4, 2013 at 8:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Nov 2, 2013 at 6:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 29 October 2013 16:10, Peter Geoghegan <pg@heroku.com> wrote:
>> On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>>> I don't see much interest in insert-efficient indexes.
>>
>> Presumably someone will get around to implementing a btree index
>> insertion buffer one day. I think that would be a particularly
>> compelling optimization for us, because we could avoid ever inserting
>> index tuples that are already dead when the deferred insertion
>> actually occurs.
>
> That's pretty much what the LSM-tree is.

What is pretty cool about this sort of thing is that there's no
intrinsic reason the insertion buffer needs to be block-structured or
disk-backed.

How do we commit to not spilling to disk, in the face of an unbounded number of indexes existing and wanting to use this mechanism simultaneously?  If it routinely needs to spill to disk, that would probably defeat the purpose of having it in the first place, but committing to never doing so seems to be extremely restrictive. As you say it is also freeing, in terms of using pointers and such, but I think the restrictions would outweigh the freedom.


 
 In theory, you can structure the in-memory portion of
the tree any way you like, using pointers and arbitrary-size memory
allocations and all that fun stuff.  You need to log that there's a
deferred insert (or commit to flushing the insertion buffer before
every commit, which would seem to miss the point) so that recovery can
reconstruct the in-memory data structure and flush it, but that's it:
the WAL format need not know any other details of the in-memory
portion of the tree.  I think that, plus the ability to use pointers
and so forth, might lead to significant performance gains.

In practice, the topology of our shared memory segment makes this a
bit tricky.  The problem isn't so much that it's fixed size as that it
lacks a real allocator, and that all the space used for shared_buffers
is nailed down and can't be borrowed for other purposes.

I think the fixed size is also a real problem, especially given the ubiquitous advice not to exceed 2 to 8 GB.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Fast insertion indexes: why no developments
Next
From: Kevin Grittner
Date:
Subject: Re: [BUGS] BUG #8542: Materialized View with another column_name does not work?