Re: LSM tree for Postgres - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: LSM tree for Postgres |
Date | |
Msg-id | 3a181d40-659b-2160-eaba-ca94067900ee@postgrespro.ru Whole thread Raw |
In response to | Re: LSM tree for Postgres (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: LSM tree for Postgres
|
List | pgsql-hackers |
On 04.08.2020 18:11, Tomas Vondra wrote: > On Tue, Aug 04, 2020 at 11:22:13AM +0300, Konstantin Knizhnik wrote: >> Hi hackers, >> >> I want to share results of my last research of implementing LSM index >> in Postgres. >> Most of modern databases (RocksDB, MongoDB, Tarantool,...) are using >> LSM tree instead of classical B-Tree. >> > > I was under the impression that LSM is more an alternative primary > storage, not for indexes. Or am I wrong / confused? Yes, originally I considered LSM for IOT (index organized table). And RocksDB FDW is actually such implementation of IOT. But implement IOT using existed nbtree indexes is more challenging task: I have thought about it but have not tried it yet. > Interesting, although those are just writes, right? Do you have any > numbers for read? Also, what are the numbers when you start with "larger > than RAM" data (i.e. ignoring the initial period when the index fits > into memory)? When benchmark starts, insertion speed is almost the same for Lsm3 and standard nbtree (we have to insert record twice, but second insertion s done in background). At the end of benchmark - when we close to 250 million records, Lsm3 shows TPS about 20 times faster than nbtree. Finally it gives about 6 times difference in elapsed time. > Yeah, I think in general FDW to a database with different consistency > model is not going to get us very far ... Good for PoC experiments, but > not really designed for stuff like this. Also, in my experience FDW has > siginficant per-row overhead. From my experience the main drawback of FDW is lack of support of parallel operations. But it is important mostly for OLAP, not for OLTP. > BTW how would your approach work with unique indexes, speculative > inserts etc.? Unique indexes are not supported now. And I do not see some acceptable solution here. If we will have to check presence of duplicate at the time of insert then it will eliminate all advantages of LSM approach. And if we postpone to the moment of merge, then... I afraid that it will be too late. > > Isn't it a bit suspicious that with more clients the throughput actually > drops significantly? Is this merely due to PoC stage, or is there some > inherent concurrency bottleneck? > My explaination is the following (I am not 100% sure that it is true): multiple clients insert records faster than merge bgworker is able to merge them to main index. It cause blown of top index and as a result it doesn't fir in memory any more. So we loose advantages of fast inserts. If we have N top indexes instead of just 2, we can keep size of each top index small enough. But in this case search operations will have to merge N indexes and so search is almost N times slow (the fact that each top index fits in memory doesn't mean that all of the fits in memory at the same time, so we still have to read pages from disk during lookups in top indexes).
pgsql-hackers by date: