On 05/11/13 05:35, Robert Haas wrote:
> On Mon, Nov 4, 2013 at 11:32 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> I think doing this outside of s_b will make stuff rather hard for
>> physical replication and crash recovery since we either will need to
>> flush the whole buffer at checkpoints - which is hard since the
>> checkpointer doesn't work inside individual databases - or we need to
>> persist the in-memory buffer across restart which also sucks.
> You might be right, but I think part of the value of LSM-trees is that
> the in-memory portion of the data structure is supposed to be able to
> be optimized for in-memory storage rather than on disk storage. It
> may be that block-structuring that data bleeds away much of the
> performance benefit. Of course, I'm talking out of my rear end here:
> I don't really have a clue how these algorithms are supposed to work.
>
How about having a 'TRANSIENT INDEX' that only exists in memory, so
there is no requirement to write it to disk or to replicate directly?
This type of index would be very fast and easier to implement. Recovery
would involve rebuilding the index, and sharing would involve recreating
on a slave. Probably not appropriate for a primary index, but may be
okay for secondary indexes used to speed specific queries.
This could be useful in some situations now, and allow time to get
experience in how best to implement the basic concept. Then a more
robust solution using WAL etc can be developed later.
I suspect that such a TRANSIENT INDEX would still be useful even when a
more robust in memory index method was available. As I expect it would
be faster to set up than a robust memory index - which might be good
when you need to have one or more indexes for a short period of time, or
the size of the index is so small that recreating it requires very
little time (total elapsed time might even be less than a disk backed one?).
Cheers,
Gavin