Re: LSM tree for Postgres - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: LSM tree for Postgres
Date
Msg-id 7c0f264f-8e13-321b-8035-9c930682966e@postgrespro.ru
Whole thread Raw
In response to Re: LSM tree for Postgres  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: LSM tree for Postgres
List pgsql-hackers

On 07.08.2020 15:31, Alexander Korotkov wrote:
> ср, 5 авг. 2020 г., 09:13 Konstantin Knizhnik <k.knizhnik@postgrespro.ru>:
>> Concerning degrade of basic index - B-Tree itself is balanced tree. Yes,
>> insertion of random keys can cause split of B-Tree page.
>> In the worst case half of B-Tree page will be empty. So B-Tree size will
>> be two times larger than ideal tree.
>> It may cause degrade up to two times. But that is all. There should not
>> be infinite degrade of speed tending to zero.
> My concerns are not just about space utilization.  My main concern is
> about the order of the pages.  After the first merge the base index
> will be filled in key order.  So physical page ordering perfectly
> matches their logical ordering.  After the second merge some pages of
> base index splits, and new pages are added to the end of the index.
> Splits also happen in key order.  So, now physical and logical
> orderings match within two extents corresponding to first and second
> merges, but not within the whole tree.  While there are only few such
> extents, disk page reads may in fact be mostly sequential, thanks to
> OS cache and readahead.  But finally, after many merges, we can end up
> with mostly random page reads.  For instance, leveldb doesn't have a
> problem of ordering degradation, because it stores levels in sorted
> files.
>
I agree with your that loosing sequential order of B-Tree pages may have 
negative impact on performance.
But it first of all critical for order-by and range queries, when we 
should traverse several subsequent leave pages.
It is less critical for exact-search or delete/insert operations. 
Efficiency of merge operations mostly depends on how much keys
will be stored at the same B-Tree page. And it is first of all 
determined by size of top index and key distribution.





pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Next
From: Tom Lane
Date:
Subject: Re: get rid of distprep?