Re: LSM tree for Postgres - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: LSM tree for Postgres
Date
Msg-id 6d224b6c-194e-11a0-3924-446492b10f71@postgrespro.ru
Whole thread Raw
In response to Re: LSM tree for Postgres  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: LSM tree for Postgres  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers

On 05.08.2020 02:59, Alexander Korotkov wrote:
>
> The things you're writing makes me uneasy.  I initially understood
> lsm3 as a quick and dirty prototype, while you're probably keeping
> some design in your mind (for instance, original design of LSM).
> However, your message makes me think you're trying to defend the
> approach currently implemented in lsm3 extension.  Therefore, I've to
> criticise this approach.
>
> 1) The base index can degrade.  At first, since merge can cause page
> splits.  Therefore logical ordering of pages will become less
> correlated with their physical ordering with each merge.
> 2) If your workload will include updates and/or deletes, page
> utilization may also degrade.
> 3) While base index degrades, merge performance also degrades.
> Traverse of base index in logical order will require more and more
> random reads (at some point almost every page read will be random).
> While the base index becomes large and/or bloat, you push less top
> index tuples to a single base index page (at some point you will push
> one tuple per page).
>
> Original LSM design implies strict guarantees over average resources
> spent per index operation.  Your design doesn't.  Moreover, I bet lsm3
> will degrade significantly even on insert-only workload.  It should
> degrade to the performance level of B-tree once you insert enough
> data.  Try something like number_of_merges =
> numer_of_tuples_per_index_page * 2 and you should see this
> degradation.  Real LSM doesn't degrade that way.

I mostly agree with your critices.
My Lsm3 is not true LSM, but from my point of view it preserves basic 
principles of LSM: fast and small top index and bulk updates of main index.
My experiments with RocksDB shows that degradation also takes place in 
this case. More experiments are needed to compare two approaches.

Concerning degrade of basic index - B-Tree itself is balanced tree. Yes, 
insertion of random keys can cause split of B-Tree page.
In the worst case half of B-Tree page will be empty. So B-Tree size will 
be two times larger than ideal tree.
It may cause degrade up to two times. But that is all. There should not 
be infinite degrade of speed tending to zero.

>
> Right now vacuum process Lsm3 indexes in usual way.
> Removing records from top indexes may be not needed at all (just because
> this indexes will be truncated in any case).
> But as far as size of top index is expected to be small enough
> vacuumming it should not take a long time,
> so I didn't to avoid it (although it should not be difficult - just
> disable ambulkdelete for correspondent  nbtree wrappers).
> It doesn't seem important, but I don't get your point here.  Postgres
> expects ambulkdelete to delete TIDs from index.  If you don't delete
> it from the top index, this TID will be merged to the base index.  And
> that could lead wrong query answers unless you eliminate those TIDs in
> a different way (during the merge stage or something).

Yes, your are right. It is not possible to avoid delete TIDs from top 
indexes.
>> Concerning deletes from main index - I do not understand how it can be
>> optimized.
> This is a trick you can learn from almost every LSM implementation.
> For instance, check docs for leveldb [1] about "delete marker".  For
> sure, that requires some redesign of the vacuum and can't be done in
> extension (at least in the reasonable way).  But, frankly speaking, I
> think core modifications are inevitable to utilize the power of LSM in
> PostgreSQL.

The main idea of Lsm3 was to investigate whether it is possible to 
achieve the same result as with "classical" LSM
using standard Postgres nbtree indexes. Right now it seems t me that 
answer is positive, but I have not performed
exhaustive measurements. For example I have not measured vacuum overhead 
(it was enabled, so vacuumming takes place
in my benchmark, but I have not tries to separate its overhead and 
influence on performance), index search speed,...




> Links
> 1. https://github.com/google/leveldb/blob/master/doc/impl.md
>
> ------
> Regards,
> Alexander Korotkov




pgsql-hackers by date:

Previous
From: "Joel Mariadasan (jomariad)"
Date:
Subject: Reg. Postgres 13
Next
From: Asim Praveen
Date:
Subject: Re: [PATCH] - Provide robust alternatives for replace_string