Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Deleting older versions in unique indexes to avoid page splits
Date
Msg-id CAH2-WznWVRmroStGXd+Kmzxv2g01dxmwA=5e9w6YJKtw-c9EKw@mail.gmail.com
Whole thread Raw
In response to Re: Deleting older versions in unique indexes to avoid page splits  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Sun, Jan 17, 2021 at 10:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> With the above example, it seems like it would also help when this is not true.

I'll respond to your remarks here separately, in a later email.

> I am not sure what I proposed fits here but the bottom-up sounds like
> we are starting from the leaf level and move upwards to root level
> which I think is not true here.

I guess that that's understandable, because it is true that B-Trees
are maintained in a bottom-up fashion. However, it's also true that
you can have top-down and bottom-up approaches in query optimizers,
and many other things (it's even something that is used to describe
governance models). The whole point of the term "bottom-up" is to
suggest that bottom-up deletion complements top-down cleanup by
VACUUM. I think that this design embodies certain principles that can
be generalized to other areas, such as heap pruning.

I recall that Robert Haas hated the name deduplication. I'm not about
to argue that my choice of "deduplication" was objectively better than
whatever he would have preferred. Same thing applies here - I more or
less chose a novel name because the concept is itself novel (unlike
deduplication). Reasonable people can disagree about what exact name
might have been better, but it's not like we're going to arrive at
something that everybody can be happy with. And whatever we arrive at
probably won't matter that much - the vast majority of users will
never need to know what either thing is. They may be important things,
but that doesn't mean that many people will ever think about them (in
fact I specifically hope that they don't ever need to think about
them).

> How is that working? Does heapam.c can someway indicate additional
> tuples (extra to what has been marked/sent by IndexAM as promising) as
> deletable? I see in heap_index_delete_tuples that we mark the status
> of the passed tuples as delectable (by setting knowndeletable flag for
> them).

The bottom-up nbtree caller to
heap_index_delete_tuples()/table_index_delete_tuple() (not to be
confused with the simple/LP_DEAD heap_index_delete_tuples() caller)
always provides heapam.c with a complete picture of the index page, in
the sense that it exhaustively has a delstate.deltids entry for each
and every TID on the page, no matter what. This is the case even
though in practice there is usually no possible way to check even 20%
of the deltids entries within heapam.c.

In general, the goal during a bottom-up pass is *not* to maximize
expected utility (i.e. the number of deleted index tuples/space
freed/whatever), exactly. The goal is to lower the variance across
related calls, so that we'll typically manage to free a fair number of
index tuples when we need to. In general it is much better for
heapam.c to make its decisions based on 2 or 3 good reasons rather
than just 1 excellent reason. And so heapam.c applies a power-of-two
bucketing scheme, never truly giving too much weight to what nbtree
tells it about duplicates. See comments above
bottomup_nblocksfavorable(), and bottomup_sort_and_shrink() comments
(both are from heapam.c).

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: CheckpointLock needed in CreateCheckPoint()?
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] Add support for leading/trailing bytea trim()ing