Re: 64.4.2. Bottom-up Index Deletion - Mailing list pgsql-docs

From Peter Geoghegan
Subject Re: 64.4.2. Bottom-up Index Deletion
Date
Msg-id CAH2-WzkNTEUyC=Q6dFQanirKcKJtxb52Nh2nLwDUX10DGd-sWg@mail.gmail.com
Whole thread Raw
In response to 64.4.2. Bottom-up Index Deletion  (PG Doc comments form <noreply@postgresql.org>)
Responses Re: 64.4.2. Bottom-up Index Deletion  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-docs
Hi Hussein,

Apologies for the very delayed response. I'm aware that you've taken
an interest in this subject as part of your YouTube channel. Thanks
for publicizing the work!

On Tue, Jul 12, 2022 at 7:14 PM PG Doc comments form
<noreply@postgresql.org> wrote:
> Would be nice to add a note: old tuple versions in the index referencing the
> same logical row cannot be deleted by bottom up index deletion process when
> older transactions that might require the old state the row are still
> running

It's really hard to write documentation for something like this,
because it's difficult to decide what your audience really needs to
know. I agree that it's important to get this specific point across,
though. In fact I thought that I already conveyed the same idea at
this point:

"All indexes will need a successor physical index tuple that points to
the latest version in the table. Each new tuple within each index will
generally need to coexist with the original “updated” tuple for a
short period of time (typically until shortly after the UPDATE
transaction commits)."

The implication is that we need the old version to coexist until after
the updater transaction commits and is seen by every possible MVCC
snapshot as having committed -- nobody sees the old version anymore.
Maybe we could augment the existing sentences I have highlighted?
Could it be more explicit?

> That is what the LP_DEAD bit is for I believe?

Sort of. You could say that the LP_DEAD bit being set means "this
index tuple is definitely useless to everybody, and so is definitely
eligible for deletion when the page fills up" -- so in that sense
you're right. However, it doesn't work the other way around -- not
every index tuple that is deleted (or that is eligible to be deleted)
will have its LP_DEAD bit set. Bottom-up index deletion only happens
when we have no LP_DEAD bits set on the leaf page (we always prefer to
delete them via simple deletion instead).

An index tuple's LP_DEAD bit is set by index scans that happen to
notice that the tuple is not needed by any possible MVCC snapshot.
This allows it to be ignored by other index scans immediately. It also
allows simple deletion to physically remove the tuple later on (at the
point that the page fills). Of course the LP_DEAD bit can only be set
when such an index scan actually happens because some SQL queries
happen to be executed by the user app.

The nice thing about bottom-up index deletion is that it reliably
places the burden of performing deletion on the specific non-HOT
updaters that are responsible for bloating the index -- they are
required to clean up their own mess, making it much less likely that
index bloat will be very destabilizing. So the main difference between
bottom-up deletion and simple deletion is the information that drives
the whole process. The physical modifications to the index page are
actually identical (the LP_DEAD bit is set by index scans, not the
deletion process itself).

--
Peter Geoghegan



pgsql-docs by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Usability ideas: text width and headers that are links
Next
From: PG Doc comments form
Date:
Subject: Example code bug: destination->data