Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
Date
Msg-id CAH2-WzkFjiayDUkgJ8kafNDzOiSngLwb=yVUJ_JRPsG0RtkUkw@mail.gmail.com
Whole thread Raw
In response to Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers  (James Coleman <jtc331@gmail.com>)
Responses Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
List pgsql-hackers
On Fri, Sep 29, 2023 at 6:27 PM James Coleman <jtc331@gmail.com> wrote:
> On Fri, Sep 29, 2023 at 4:06 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > I think that it's talking about what happens during opportunistic
> > pruning, in particular what happens to HOT chains. (Though pruning
> > does almost the same amount of useful work with non-heap-only tuples,
> > so it's a bit unfortunate that the name "HOT pruning" seems to have
> > stuck.)
>
> That's very likely what the intention was. I read it again, and the
> same confusion still sticks out to me: it doesn't say anything
> explicitly about opportunistic pruning (I'm not sure if that term is
> "public docs" level, so that's probably fine), and it doesn't scope
> the claim to intermediate tuples in a HOT chain -- indeed the context
> is the HOT feature generally.

It doesn't mention opportunistic pruning by name, but it does say:

"Old versions of updated rows can be completely removed during normal
operation, including SELECTs, instead of requiring periodic vacuum
operations."

There is a strong association between HOT and pruning (particularly
opportunistic pruning) in the minds of some hackers (and perhaps some
users), because both features appeared together in 8.3, and both are
closely related at the implementation level. It's nevertheless not
quite accurate to say that HOT "provides two optimizations" -- since
pruning (the second of the two bullet points) isn't fundamentally
different for pages that don't have any HOT chains. Not at the level
of the heap pages, at least (indexes are another matter).

Explaining these sorts of distinctions through prose is very
difficult. You really need diagrams for something like this IMV.
Without that, the only way to make all of this less confusing is to
avoid all discussion of pruning...but then you can't really make the
point about breaking the dependency on VACUUM, which is a relatively
important point -- one with real practical relevance.

> This is why I discovered it: it says "indexes do not reference their
> page item identifiers", which is manifestly not true when talking
> about the root item, and in fact would defeat the whole purpose of HOT
> (at least in a old-to-new chain like Postgres uses).

Yeah, but...that's not what was intended. Obviously, the index hasn't
changed, and we expect index scans to continue to give correct
answers. So it is pretty strongly implied that it continues to point
to something valid.

> Assuming people can be convinced this is confusing (I realize you may
> not be yet), I see two basic options:
>
> 1. Update this to discuss both intermediate tuples and root items
> separately. This could entail either one larger paragraph or splitting
> such that instead of "two optimizations" we say "three" optimizations.
>
> 2. Change "old versions" to something like "intermediate versions in a
> series of updates".
>
> I prefer some form of (1) since it more fully describes the behavior,
> but we could tweak further for concision.

Bruce authored these docs. I was mostly just glad to have anything at
all about HOT in the user-facing docs, quite honestly.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
Next
From: Bruce Momjian
Date:
Subject: Re: document the need to analyze partitioned tables