Home > mailing lists

Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers

From	Victor Yegorov
Subject	Re: Deleting older versions in unique indexes to avoid page splits
Date	January 18, 2021 14:41:09
Msg-id	CAGnEboiX3dwP4oFrCT6FDwwOb-87Dr+ghJ5vMqqfm9wxfQeDoA@mail.gmail.com Whole thread Raw
In response to	Re: Deleting older versions in unique indexes to avoid page splits (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Deleting older versions in unique indexes to avoid page splits (Amit Kapila <amit.kapila16@gmail.com>)
List	pgsql-hackers

Tree view

пн, 18 янв. 2021 г. в 07:44, Amit Kapila <amit.kapila16@gmail.com>:

The summary of the above is that with Case-1 (clean-up based on hint
received with the last item on the page) it takes fewer operations to
cause a page split as compared to Case-2 (clean-up based on hint
received with at least of the items on the page) for a mixed workload.
How can we say that it doesn't matter?

I cannot understand this, perhaps there's a word missing in the brackets?..

Thinking of your proposal to undertake bottom-up deletion also on the last-to-fit tuple insertion,

I'd like to start with my understanding of the design:

* we try to avoid index bloat, but we do it in the “lazy” way (for a reason, see below)

* it means, that if there is enough space on the leaf page to fit new index tuple,

we just fit it there

* if there's not enough space, we first look at the presence of LP_DEAD tuples,

and if they do exits, we scan the whole (index) page to re-check all tuples in order

to find others, not-yet-marked-but-actually-being-LP_DEAD tuples and clean all those up.

* if still not enough space, only now we try bottom-up-deletion. This is heavy operation and

can cause extra IO (tests do show this), therefore this operation is undertaken at the point,

when we can justify extra IO against leaf page split.

* if no space also after bottom-up-deletion, we perform deduplication (if possible)

* finally, we split the page.

Should we try bottom-up-deletion pass in a situation where we're inserting the last possible tuple

into the page (enough space for *current* insert, but likely no space for the next),
then (among others) exists the following possibilities:

- no new tuples ever comes to this page anymore (happy accident), which means that

we've wasted IO cycles

- we undertake bottom-up-deletion pass without success and we're asked to insert new tuple in this

fully packed index page. This can unroll to:
- due to the delay, we've managed to find some space either due to LP_DEAD or bottom-up-cleanup.
which means we've wasted IO cycles on the previous iteration (too early attempt).

- we still failed to find any space and are forced to split the page.
in this case we've wasted IO cycles twice.

In my view these cases when we generated wasted IO cycles (producing no benefit) should be avoided.
And this is main reason for current approach.

Again, this is my understanding and I hope I got it right.

Victor Yegorov

pgsql-hackers by date:

From: Peter Smith
Date: 18 January 2021, 13:43:52
Subject: Re: Single transaction in the tablesync worker?

From: Ashutosh Bapat
Date: 18 January 2021, 14:43:28
Subject: Re: search_plan_tree(): handling of non-leaf CustomScanState nodes causes segfault

Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers

Previous

Next