Home > mailing lists

Re: LSM tree for Postgres - Mailing list pgsql-hackers

From	Konstantin Knizhnik
Subject	Re: LSM tree for Postgres
Date	August 4, 2020 20:18:01
Msg-id	3a181d40-659b-2160-eaba-ca94067900ee@postgrespro.ru Whole thread Raw
In response to	Re: LSM tree for Postgres (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: LSM tree for Postgres
List	pgsql-hackers

Tree view

On 04.08.2020 18:11, Tomas Vondra wrote:
> On Tue, Aug 04, 2020 at 11:22:13AM +0300, Konstantin Knizhnik wrote:
>> Hi hackers,
>>
>> I want to share results of my last research of implementing LSM index 
>> in Postgres.
>> Most of modern databases (RocksDB, MongoDB, Tarantool,...) are using 
>> LSM tree instead of classical B-Tree.
>>
>
> I was under the impression that LSM is more an alternative primary
> storage, not for indexes. Or am I wrong / confused?

Yes, originally I considered LSM for IOT (index organized table).
And RocksDB FDW is actually such implementation of IOT.
But implement IOT using existed nbtree indexes is more challenging task:
I have thought about it but have not tried it yet.

> Interesting, although those are just writes, right? Do you have any
> numbers for read? Also, what are the numbers when you start with "larger
> than RAM" data (i.e. ignoring the initial period when the index fits
> into memory)?

When benchmark starts, insertion speed is almost the same for Lsm3 and 
standard nbtree
(we have to insert record twice, but second insertion s done in background).
At the end of benchmark - when we close to 250 million records, Lsm3 
shows TPS about 20 times faster than nbtree.
Finally it gives about 6 times difference in elapsed time.

> Yeah, I think in general FDW to a database with different consistency
> model is not going to get us very far ... Good for PoC experiments, but
> not really designed for stuff like this. Also, in my experience FDW has
> siginficant per-row overhead.

 From my experience the main drawback of FDW is lack of support of 
parallel operations.
But it is important mostly for OLAP, not for OLTP.

> BTW how would your approach work with unique indexes, speculative
> inserts etc.?

Unique indexes are not supported now.
And I do not see some acceptable solution here.
If we will have to check presence of duplicate at the time of insert 
then it will eliminate all advantages of LSM approach.
And if we postpone to the moment of merge, then... I afraid that it will 
be too late.

>
> Isn't it a bit suspicious that with more clients the throughput actually
> drops significantly? Is this merely due to PoC stage, or is there some
> inherent concurrency bottleneck?
>
My explaination is the following (I am not 100% sure that it is true): 
multiple clients insert records faster than merge bgworker is able to 
merge them to main index. It cause blown of top index and as a result it 
doesn't fir in memory any more.
So we loose advantages of fast inserts. If we have N top indexes instead 
of just 2, we can keep size of each top index small enough.
But in this case search operations will have to merge N indexes and so 
search is almost N times slow (the fact that each top index fits in memory
doesn't mean that all of the fits in memory at the same time, so we 
still have to read pages from disk during lookups in top indexes).

pgsql-hackers by date:

From: Konstantin Knizhnik
Date: 04 August 2020, 19:56:55
Subject: Re: LSM tree for Postgres

From: Konstantin Knizhnik
Date: 04 August 2020, 20:21:18
Subject: Re: LSM tree for Postgres

Re: LSM tree for Postgres - Mailing list pgsql-hackers

Previous

Next