Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Deleting older versions in unique indexes to avoid page splits |
Date | |
Msg-id | CAH2-WznWVRmroStGXd+Kmzxv2g01dxmwA=5e9w6YJKtw-c9EKw@mail.gmail.com Whole thread Raw |
In response to | Re: Deleting older versions in unique indexes to avoid page splits (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Sun, Jan 17, 2021 at 10:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > With the above example, it seems like it would also help when this is not true. I'll respond to your remarks here separately, in a later email. > I am not sure what I proposed fits here but the bottom-up sounds like > we are starting from the leaf level and move upwards to root level > which I think is not true here. I guess that that's understandable, because it is true that B-Trees are maintained in a bottom-up fashion. However, it's also true that you can have top-down and bottom-up approaches in query optimizers, and many other things (it's even something that is used to describe governance models). The whole point of the term "bottom-up" is to suggest that bottom-up deletion complements top-down cleanup by VACUUM. I think that this design embodies certain principles that can be generalized to other areas, such as heap pruning. I recall that Robert Haas hated the name deduplication. I'm not about to argue that my choice of "deduplication" was objectively better than whatever he would have preferred. Same thing applies here - I more or less chose a novel name because the concept is itself novel (unlike deduplication). Reasonable people can disagree about what exact name might have been better, but it's not like we're going to arrive at something that everybody can be happy with. And whatever we arrive at probably won't matter that much - the vast majority of users will never need to know what either thing is. They may be important things, but that doesn't mean that many people will ever think about them (in fact I specifically hope that they don't ever need to think about them). > How is that working? Does heapam.c can someway indicate additional > tuples (extra to what has been marked/sent by IndexAM as promising) as > deletable? I see in heap_index_delete_tuples that we mark the status > of the passed tuples as delectable (by setting knowndeletable flag for > them). The bottom-up nbtree caller to heap_index_delete_tuples()/table_index_delete_tuple() (not to be confused with the simple/LP_DEAD heap_index_delete_tuples() caller) always provides heapam.c with a complete picture of the index page, in the sense that it exhaustively has a delstate.deltids entry for each and every TID on the page, no matter what. This is the case even though in practice there is usually no possible way to check even 20% of the deltids entries within heapam.c. In general, the goal during a bottom-up pass is *not* to maximize expected utility (i.e. the number of deleted index tuples/space freed/whatever), exactly. The goal is to lower the variance across related calls, so that we'll typically manage to free a fair number of index tuples when we need to. In general it is much better for heapam.c to make its decisions based on 2 or 3 good reasons rather than just 1 excellent reason. And so heapam.c applies a power-of-two bucketing scheme, never truly giving too much weight to what nbtree tells it about duplicates. See comments above bottomup_nblocksfavorable(), and bottomup_sort_and_shrink() comments (both are from heapam.c). -- Peter Geoghegan
pgsql-hackers by date: