Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Deleting older versions in unique indexes to avoid page splits
Date
Msg-id CAA4eK1++MKKq=vRDO1P3_0fgqnQm9sTNCCV-61Q13h3m7eVfDQ@mail.gmail.com
Whole thread Raw
In response to Re: Deleting older versions in unique indexes to avoid page splits  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Deleting older versions in unique indexes to avoid page splits  (Victor Yegorov <vyegorov@gmail.com>)
Re: Deleting older versions in unique indexes to avoid page splits  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Mon, Jan 18, 2021 at 12:43 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Sat, Jan 16, 2021 at 9:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Do we do this optimization (bottom-up deletion) only when the last
> > item which can lead to page split has indexUnchanged set to true? If
> > so, what if the last tuple doesn't have indexUnchanged but the
> > previous ones do have?
>
> Using the indexUnchanged hint as a triggering condition makes sure
> that non-HOT updaters are the ones that pay the cost of a bottom-up
> deletion pass. We create backpressure for non-HOT updaters (in indexes
> that are logically unchanged) specifically. (Of course it's also true
> that the presence of the indexUnchanged hint is highly suggestive of
> there being more version churn duplicates on the leaf page already,
> which is not actually certain.)
>
> It's possible that there will be some mixture of inserts and non-hot
> updates on the same leaf page, and as you say the implementation might
> fail to do a bottom-up pass based on the fact that an incoming item at
> the point of a would-be page split was a plain insert (and so didn't
> receive the hint).
>

or it would do the scan when that is the only time for this leaf page
to receive such a hint.

> The possibility of these kinds of "missed
> opportunities" are okay because a page split was inevitable either
> way. You can imagine a case where that isn't true (i.e. a missed
> opportunity to avoid a page split), but that's kind of like imagining
> a fair coin flip taking place 100 times and coming up heads each time.
> It just isn't realistic for such an "mixed tendencies" leaf page to
> stay in flux (i.e. not ever split) forever, with the smartest
> triggering condition in the world -- it's too unstable.
>

But even if we don't want to avoid it forever delaying it will also
have advantages depending upon the workload. Let's try to see by some
example, say the item and page size are such that it would require 12
items to fill the page completely. Case-1, where we decide based on
the hint received in the last item, and Case-2 where we decide based
on whether we ever received such a hint for the leaf page.

Case-1:
========
12 new items (6 inserts 6 non-HOT updates)
Page-1: 12 items - no split would be required because we received the
hint (indexUnchanged) with the last item, so page-1 will have 6 items
after clean up.

6 new items (3 inserts, 3 non-HOT updates)
Page-1: 12 items (9 inserts, 3 non-HOT updates) lead to split because
we received the last item without a hint.

The SPLIT-1 happens after 18 operations (6 after the initial 12).

After this split (SPLIT-1), we will have 2 pages with the below state:
Page-1: 6 items (4 inserts, 2 non-HOT updates)
Page-2: 6 items (5 inserts, 1 non-HOT updates)

6 new items (3 inserts, 3 non-HOT updates)
Page-1 got 3 new items 1 insert and 2 non-HOT updates; Page-1: 9 items
(5 inserts, 4 non-HOT updates)
Page-2 got 3 new items 2 inserts and 1 non-HOT update; Page-2: 9 items
(7 inserts, 2 non-HOT updates)

6 new items (3 inserts, 3 non-HOT updates)
Page-1 got 3 new items 1 insert and 2 non-HOT updates; Page-1: 12
items (6 inserts, 6 non-HOT updates) doesn't lead to split
Page-2 got 3 new items 2 inserts and 1 non-HOT update; Page-2: 9 items
(9 inserts, 3 non-HOT updates) lead to split (split-2)

The SPLIT-2 happens after 30 operations (12 new operations after the
previous split).

Case-2:
========
12 new items (6 inserts 6 non-HOT updates)
Page-1: 12 items - no split would be required because we received the
hint (indexUnchanged) with at least one of the item, so page-1 will
have 6 items after clean up.

6 new items (3 inserts, 3 non-HOT updates)
Page-1: 12 items (9 inserts, 3 non-HOT updates), cleanup happens and
Page-1 will have 9 items.

6 new items (3 inserts, 3 non-HOT updates), at this stage in whichever
order the new items are received one split can't be avoided.

The SPLIT-1 happens after 24 new operations (12 new ops after initial 12).
Page-1: 6 items (6 inserts)
Page-2: 6 items (6 inserts)

6 new items (3 inserts, 3 non-HOT updates)
Page-1 got 3 new items 1 insert and 2 non-HOT updates; Page-1: 9 items
(7 inserts, 2 non-HOT updates)
Page-2 got 3 new items 2 inserts and 1 non-HOT update; Page-2: 9 items
(8 inserts, 1 non-HOT updates)

6 new items (3 inserts, 3 non-HOT updates)
Page-1 got 3 new items 1 insert and 2 non-HOT updates; Page-1: 12
items (8 inserts, 4 non-HOT updates) clean up happens and page-1 will
have 8 items.
Page-2 got 3 new items 2 inserts and 1 non-HOT update; Page-2: 9 items
(10 inserts, 2 non-HOT updates) clean up happens and page-2 will have
10 items.

6 new items (3 inserts, 3 non-HOT updates)
Page-1 got 3 new items, 1 insert and 2 non-HOT updates; Page-1: 11
items (9 inserts, 2 non-HOT updates)
Page-2 got 3 new items, 2 inserts and 1 non-HOT update; Page-2: 12
items (12 inserts, 0 non-HOT updates) cleanup happens for one of the
non-HOT updates

6 new items (3 inserts, 3 non-HOT updates)
Page-1 got 3 new items, 1 insert, and 2 non-HOT updates; Page-1: 12
items (12 inserts, 0 non-HOT updates) cleanup happens for one of the
non-HOT updates
Page-2 got 3 new items, 2 inserts, and 1 non-HOT update; Page-2: split happens

After split
Page-1: 12 items (12 inserts)
Page-2: 6 items (6 inserts)
Page-3: 6 items (6 inserts)

The SPLIT-2 happens after 48 operations (24 new operations)

The summary of the above is that with Case-1 (clean-up based on hint
received with the last item on the page) it takes fewer operations to
cause a page split as compared to Case-2 (clean-up based on hint
received with at least of the items on the page) for a mixed workload.
How can we say that it doesn't matter?

> An important part of understanding the design is to imagine things at
> the leaf page level, while thinking about how what that actually looks
> like differs from an ideal physical representation of the same leaf
> page that is much closer to the logical contents of the database.
> We're only interested in leaf pages where the number of logical rows
> is basically fixed over time (when there is version churn).
>

With the above example, it seems like it would also help when this is not true.

> Usually
> this just means that there are lots of non-HOT updates, but it can
> also work with lots of queue-like inserts and deletes in a unique
> index.
>
> Ultimately, the thing that determines whether or not the bottom-up
> deletion optimization is effective for any given leaf page is the fact
> that it actually prevents the page from splitting despite lots of
> version churn -- this happens again and again. Another key concept
> here is that bottom-up deletion passes for the same leaf page are
> related in important ways -- it is often a mistake to think of
> individual bottom-up passes as independent events.
>
> > Why are we using terminology bottom-up deletion and why not simply
> > duplicate version deletion or something on those lines?
>
> Why is that simpler?
>

I am not sure what I proposed fits here but the bottom-up sounds like
we are starting from the leaf level and move upwards to root level
which I think is not true here.

> Also, it doesn't exactly delete duplicates. See
> my later remarks.
>
> > How do we distinguish between version duplicate tuples (added due to
> > the reason that some other index column is updated) or duplicate
> > tuples (added because a user has inserted a row with duplicate value)
> > for the purpose of bottom-up deletion?  I think we need to distinguish
> > between them because in the earlier case we can remove the old version
> > of the tuple in the index but not in later. Or is it the case that
> > indexAm doesn't differentiate between them but relies on heapAM to
> > give the correct answer?
>
> Bottom-up deletion uses the presence of duplicates as a starting point
> for determining which heap pages to visit, based on the assumption
> that at least some are obsolete versions. But heapam.c has additional
> heap-level heuristics that help too.
>
> It is quite possible and even likely that we will delete some
> non-duplicate tuples in passing, just because they're checked in
> passing -- they might turn out to be deletable, for whatever reason.
>

How is that working? Does heapam.c can someway indicate additional
tuples (extra to what has been marked/sent by IndexAM as promising) as
deletable? I see in heap_index_delete_tuples that we mark the status
of the passed tuples as delectable (by setting knowndeletable flag for
them).

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: POC: postgres_fdw insert batching