Re: LSM tree for Postgres - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: LSM tree for Postgres |
Date | |
Msg-id | CAPpHfdu9jW-HnemRy-Yo-fAwoEVPfUpXi1e63pnmWqnk8SheFA@mail.gmail.com Whole thread Raw |
In response to | Re: LSM tree for Postgres (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Responses |
Re: LSM tree for Postgres
|
List | pgsql-hackers |
On Tue, Aug 4, 2020 at 7:56 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > I do not understand why do we need multiple indexes. > We need one "hot" index which fits in memory to perform fast inserts. > But it should not be too small to be able to accumulate substantial > amount of records to provide efficient bulk insert. > I expect that top index can be efficiently merger with based index > because of better access locality. > I.e. we will insert multiple entries into one B-Tree page and so > minimize slowdown of random reads. > > Third index is needed to perform parallel merge (while merge is in > progress top index will be blocked and we can not perform inserts in it). > I do not understand benefits of performing more than one merge in > parallel: it will only increase fraction of random reads. > > Degradation certainly takes place. But it is not so critical as in case > of standard nbtree. > It is possible to tune threshold for top index size to make merge most > efficient. > But we can not truncate and swap index before we complete the merge. > So if merge operation takes long time, then it will cause exhaustion of > top index and it will not fit in memory any more. > It will lead to further slowdown (so we have negative feedback here). > > Certainly it is possible to create more top indexes, keeping their size > small enough to fit in memory. > But in this case search will require merge not of 3, but of N indexes. > I think that it may cause unacceptable slowdown of search operations. > And it is highly undesirable, taken in account that most of application > send more select-only queries than updates. The things you're writing makes me uneasy. I initially understood lsm3 as a quick and dirty prototype, while you're probably keeping some design in your mind (for instance, original design of LSM). However, your message makes me think you're trying to defend the approach currently implemented in lsm3 extension. Therefore, I've to criticise this approach. 1) The base index can degrade. At first, since merge can cause page splits. Therefore logical ordering of pages will become less correlated with their physical ordering with each merge. 2) If your workload will include updates and/or deletes, page utilization may also degrade. 3) While base index degrades, merge performance also degrades. Traverse of base index in logical order will require more and more random reads (at some point almost every page read will be random). While the base index becomes large and/or bloat, you push less top index tuples to a single base index page (at some point you will push one tuple per page). Original LSM design implies strict guarantees over average resources spent per index operation. Your design doesn't. Moreover, I bet lsm3 will degrade significantly even on insert-only workload. It should degrade to the performance level of B-tree once you insert enough data. Try something like number_of_merges = numer_of_tuples_per_index_page * 2 and you should see this degradation. Real LSM doesn't degrade that way. > Yes, this approach increase mount of logged data twice: > we need to write in WAL inserts in top and base indexes. > And it will certainly have negative influence on performance. > Unfortunately I have no idea how to avoid it without patching Postgres core. Huh, I didn't mean "without patching Postgres core" :) I mean it's difficult in principle, assuming PostgreSQL recovery is single-process and doesn't have access to system catalog (because it might be inconsistent). > Right now vacuum process Lsm3 indexes in usual way. > Removing records from top indexes may be not needed at all (just because > this indexes will be truncated in any case). > But as far as size of top index is expected to be small enough > vacuumming it should not take a long time, > so I didn't to avoid it (although it should not be difficult - just > disable ambulkdelete for correspondent nbtree wrappers). It doesn't seem important, but I don't get your point here. Postgres expects ambulkdelete to delete TIDs from index. If you don't delete it from the top index, this TID will be merged to the base index. And that could lead wrong query answers unless you eliminate those TIDs in a different way (during the merge stage or something). > Concerning deletes from main index - I do not understand how it can be > optimized. This is a trick you can learn from almost every LSM implementation. For instance, check docs for leveldb [1] about "delete marker". For sure, that requires some redesign of the vacuum and can't be done in extension (at least in the reasonable way). But, frankly speaking, I think core modifications are inevitable to utilize the power of LSM in PostgreSQL. Links 1. https://github.com/google/leveldb/blob/master/doc/impl.md ------ Regards, Alexander Korotkov
pgsql-hackers by date: