Re: 64.4.2. Bottom-up Index Deletion - Mailing list pgsql-docs

From Peter Geoghegan
Subject Re: 64.4.2. Bottom-up Index Deletion
Date
Msg-id CAH2-Wz=ydag6Nn5taDfXB7UK1g53+wY5SJMS5Ka-42LOxQa1XA@mail.gmail.com
Whole thread Raw
In response to Re: 64.4.2. Bottom-up Index Deletion  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: 64.4.2. Bottom-up Index Deletion  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-docs
On Wed, Nov 9, 2022 at 2:20 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Maybe a bit more explicitness is in order.

Yeah, maybe.

> On the point of "will generally need to coexist" - I don't see why we are being wishy-washy here, though.

Because sometimes it will take a relatively long time (say in the
presence of a long running xact/snapshot). With an aborted transaction
there will be no wait for the tuple to become eligible to
remove/delete at all (even in the presence of a long running
xact/shapshot).

> When updating a row where bottom-up deletion is chosen the most recent tuple cannot be removed to make room for the
newtuple; in particular, because the current update may not commit. 

Sort of. Could also be multiple still-needed tuples.

In fact it's possible that it'll be a tuple that became garbage as a
result of a DELETE statement run during a committed transaction -- we
need only visit the same heap page during the processing that takes
place on the heapam side.

> I'm also not inherently understanding how the bottom-up pass can know a tuple is safe to remove based upon visibility
informationwhen that information is not present in the index AND it doesn't rely upon LP_DEAD. 

It goes to the heap to get it. B-Tree does this based on speculative
criteria (mostly we visit heap pages pointed to by the most duplicate
index tuples first). The process starts out without knowing for sure
whether or not it'll work out. This is okay because we give up as soon
as any single heap page is visited without it yielding at least one
deletable tuple. And because it happens at the point of a would-be
page split caused by a non-HOT updater -- a crucial juncture for the
leaf page.

> "B-Tree indexes incrementally delete" - is it really the index self-modifying or is it an active user session taking
sometime to perform each pass?  Describing it as, say: 
>
> "The updating session will locate all the logically equivalent tuples (on the same page) via the index and check them
forglobal visibility, removing those that it finds that are both older than the most recent tuple and no longer visible
toall other sessions." 

But it actually just deletes whatever tuples are determined to be safe
to delete. The B-Tree side of this has some very vague idea about how
updates and MVCC version churn works, but it is quite unsure of
itself.

You can think of the whole process as a process of ranking heap blocks
pointed to by tuples on the affected B-Tree leaf page. We are not
exactly maximizing the amount of space freed on the leaf page. It's
more a case of minimizing the probability of having to split the page,
while avoiding ever spending more than a small amount of effort on
truly hopeless cases. We can avoid a page split for a very long time
just by deleting only a small number of index tuples in many cases (it
all depends on the workload).

I don't think that these subtleties need to be documented. But it's
difficult to know where to draw the line.

--
Peter Geoghegan



pgsql-docs by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: 64.4.2. Bottom-up Index Deletion
Next
From: "David G. Johnston"
Date:
Subject: Re: 64.4.2. Bottom-up Index Deletion