Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers |
Date | |
Msg-id | CAH2-WzkFjiayDUkgJ8kafNDzOiSngLwb=yVUJ_JRPsG0RtkUkw@mail.gmail.com Whole thread Raw |
In response to | Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers (James Coleman <jtc331@gmail.com>) |
Responses |
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
|
List | pgsql-hackers |
On Fri, Sep 29, 2023 at 6:27 PM James Coleman <jtc331@gmail.com> wrote: > On Fri, Sep 29, 2023 at 4:06 PM Peter Geoghegan <pg@bowt.ie> wrote: > > I think that it's talking about what happens during opportunistic > > pruning, in particular what happens to HOT chains. (Though pruning > > does almost the same amount of useful work with non-heap-only tuples, > > so it's a bit unfortunate that the name "HOT pruning" seems to have > > stuck.) > > That's very likely what the intention was. I read it again, and the > same confusion still sticks out to me: it doesn't say anything > explicitly about opportunistic pruning (I'm not sure if that term is > "public docs" level, so that's probably fine), and it doesn't scope > the claim to intermediate tuples in a HOT chain -- indeed the context > is the HOT feature generally. It doesn't mention opportunistic pruning by name, but it does say: "Old versions of updated rows can be completely removed during normal operation, including SELECTs, instead of requiring periodic vacuum operations." There is a strong association between HOT and pruning (particularly opportunistic pruning) in the minds of some hackers (and perhaps some users), because both features appeared together in 8.3, and both are closely related at the implementation level. It's nevertheless not quite accurate to say that HOT "provides two optimizations" -- since pruning (the second of the two bullet points) isn't fundamentally different for pages that don't have any HOT chains. Not at the level of the heap pages, at least (indexes are another matter). Explaining these sorts of distinctions through prose is very difficult. You really need diagrams for something like this IMV. Without that, the only way to make all of this less confusing is to avoid all discussion of pruning...but then you can't really make the point about breaking the dependency on VACUUM, which is a relatively important point -- one with real practical relevance. > This is why I discovered it: it says "indexes do not reference their > page item identifiers", which is manifestly not true when talking > about the root item, and in fact would defeat the whole purpose of HOT > (at least in a old-to-new chain like Postgres uses). Yeah, but...that's not what was intended. Obviously, the index hasn't changed, and we expect index scans to continue to give correct answers. So it is pretty strongly implied that it continues to point to something valid. > Assuming people can be convinced this is confusing (I realize you may > not be yet), I see two basic options: > > 1. Update this to discuss both intermediate tuples and root items > separately. This could entail either one larger paragraph or splitting > such that instead of "two optimizations" we say "three" optimizations. > > 2. Change "old versions" to something like "intermediate versions in a > series of updates". > > I prefer some form of (1) since it more fully describes the behavior, > but we could tweak further for concision. Bruce authored these docs. I was mostly just glad to have anything at all about HOT in the user-facing docs, quite honestly. -- Peter Geoghegan
pgsql-hackers by date: