Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers

From Victor Yegorov
Subject Re: Deleting older versions in unique indexes to avoid page splits
Date
Msg-id CAGnEboiX3dwP4oFrCT6FDwwOb-87Dr+ghJ5vMqqfm9wxfQeDoA@mail.gmail.com
Whole thread Raw
In response to Re: Deleting older versions in unique indexes to avoid page splits  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Deleting older versions in unique indexes to avoid page splits  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
пн, 18 янв. 2021 г. в 07:44, Amit Kapila <amit.kapila16@gmail.com>:
The summary of the above is that with Case-1 (clean-up based on hint
received with the last item on the page) it takes fewer operations to
cause a page split as compared to Case-2 (clean-up based on hint
received with at least of the items on the page) for a mixed workload.
How can we say that it doesn't matter?

I cannot understand this, perhaps there's a word missing in the brackets?..


Thinking of your proposal to undertake bottom-up deletion also on the last-to-fit tuple insertion,
I'd like to start with my understanding of the design:

* we try to avoid index bloat, but we do it in the “lazy” way (for a reason, see below)

* it means, that if there is enough space on the leaf page to fit new index tuple,
  we just fit it there

* if there's not enough space, we first look at the presence of LP_DEAD tuples,
  and if they do exits, we scan the whole (index) page to re-check all tuples in order
  to find others, not-yet-marked-but-actually-being-LP_DEAD tuples and clean all those up.

* if still not enough space, only now we try bottom-up-deletion. This is heavy operation and
  can cause extra IO (tests do show this), therefore this operation is undertaken at the point,
  when we can justify extra IO against leaf page split.

* if no space also after bottom-up-deletion, we perform deduplication (if possible)

* finally, we split the page.

Should we try bottom-up-deletion pass in a situation where we're inserting the last possible tuple
into the page (enough space for *current* insert, but likely no space for the next),
then (among others) exists the following possibilities:

- no new tuples ever comes to this page anymore (happy accident), which means that
  we've wasted IO cycles

- we undertake bottom-up-deletion pass without success and we're asked to insert new tuple in this
  fully packed index page. This can unroll to:
  - due to the delay, we've managed to find some space either due to LP_DEAD or bottom-up-cleanup.
    which means we've wasted IO cycles on the previous iteration (too early attempt).
  - we still failed to find any space and are forced to split the page.
    in this case we've wasted IO cycles twice.

In my view these cases when we generated wasted IO cycles (producing no benefit) should be avoided.
And this is main reason for current approach.

Again, this is my understanding and I hope I got it right.


--
Victor Yegorov

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Single transaction in the tablesync worker?
Next
From: Ashutosh Bapat
Date:
Subject: Re: search_plan_tree(): handling of non-leaf CustomScanState nodes causes segfault