Re: 64.4.2. Bottom-up Index Deletion - Mailing list pgsql-docs

From Peter Geoghegan
Subject Re: 64.4.2. Bottom-up Index Deletion
Date
Msg-id CAH2-WznKttnTi2kcRM3TRu=iuX-RUoKhSw4xuScV7-C7-eD2bA@mail.gmail.com
Whole thread Raw
In response to Re: 64.4.2. Bottom-up Index Deletion  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-docs
On Wed, Nov 9, 2022 at 4:06 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Ok, so the "most recent live tuple" - this in an update, there has to be one.  Can that live tuple that is being
updatedever be removed by this process? 

No.

> I'm not sure what the long-running transaction has to do with this though - we are in an update so we've locked an
activetuple and that is the only tuple this coexistence is talking about. 

We cannot remove "recently dead" tuples, which are definitely dead
(they were rendered garbage by a transaction that has already
committed), but are still required by some snapshots. We know that
they're bound to become garbage that is safe to remove sooner or later
-- but it's not safe just yet.

> You lost me here.  Which tuple is "it" - the one we are updating got deleted under us?

No. We delete any and all tuples that are found to be deletable. We
don't care about the true reasons at that point. We're perfectly
content to be the beneficiary of a certain kind of dumb luck if things
go that way (though we'll typically see only a few garbage tuples that
originate from DELETEs or aborted inserting xacts). In the end, we're
really just ranking heap blocks, and then visiting them in the hopes
of finding that we can avoid the page split. We're choosing among heap
blocks, not index tuples (once we visit a heap block we'll delete any
index tuples that point to the heap block and are seen to be eligible
to be deleted).

> So, the page that the update has to be placed on gets picked for cleanup but the actual fetching of rows and such
doesn'tactually care what that tuple is.  Though its ancestors are likely to be picked. 

Yeah, heap pages pointed to by relatively many duplicate index tuples
tend to be visited first.

So-called "dumb luck" is more likely than you'd think. There often
won't be very many heap pages in total -- in which case we're bound to
stumble upon any garbage tuples that weren't actually rendered garbage
by an updater in passing.

> Knowing for certain whether the cleanup scope is "just prior versions of my update" or "any old versions on the page
myupdate just happened to be placed on" seems user-facing worthy. 

We don't particularly care about what the incoming updater that
happens to have triggered the bottom-up deletion pass was doing. It's
not special. We don't particularly favor the row being updated over
other similar looking rows on the same leaf page. We'll usually have
to visit 2 or 3 heap pages total (per bottom-up pass) in any case.

> I'm almost thinking the last sentence "Each new tuple within each index....for a short period of time..." is doing
moreharm than good.  Removing it makes things clearer. 

I'm unsure. Does anybody else have an opinion on this?

> Bottom-up index deletion is an incremental version churn deletion process, triggered by an anticipated "version churn
pagesplit".  This only happens.....(the rest can probably remain the same, though I suggest moving the note to the end
ofthe section)." 

Probably the most important point is that bottom-up deletion
compensates for cases where the HOT optimization wasn't used, though
only for those indexes that weren't logically modified by the UPDATE.
It more or less takes over from HOT. But it doesn't necessarily help
at all with DELETEs (not reliably).

> Adding "no-longer-visible" at the start should hopefully be a sufficient reminder for someone familiar with MVCC; or
aclue that more reading of pre-requisite features is needed before understanding this internal part of the system. 

Yeah, it's not clear what the prerequisites for reading this material
should be. It is part of the B-Tree internals chapter, though, so it
is very advanced material.

--
Peter Geoghegan



pgsql-docs by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: 64.4.2. Bottom-up Index Deletion
Next
From: jian he
Date:
Subject: pg_relation_is_updatable, pg_column_is_updatable not documented