Thread: [DOCS] HOT - correct claim about indexes not referencing old line pointers

[DOCS] HOT - correct claim about indexes not referencing old line pointers

From

James Coleman

Date:

29 September 2023, 17:39:07

Hello,

While working on my talk for PGConf.NYC next week I came across this
bullet in the docs on heap only tuples:

> Old versions of updated rows can be completely removed during normal
> operation, including SELECTs, instead of requiring periodic vacuum
> operations. (This is possible because indexes do not reference their page
> item identifiers.)

But when a HOT update happens the entry in an (logically unchanged)
index still points to the original heap tid, and that line item is
updated with a pointer to the new line pointer in the same page.

Assuming I'm understanding this correctly, attached is a patch
correcting the description.

Thanks,
James Coleman

Attachment

v1-0001-Correct-HOT-docs-to-account-for-LP_REDIRECT.patch

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

"David G. Johnston"

Date:

29 September 2023, 18:03:30

On Fri, Sep 29, 2023 at 10:45 AM James Coleman <jtc331@gmail.com> wrote:

Hello,

While working on my talk for PGConf.NYC next week I came across this
bullet in the docs on heap only tuples:

> Old versions of updated rows can be completely removed during normal
> operation, including SELECTs, instead of requiring periodic vacuum
> operations. (This is possible because indexes do not reference their page
> item identifiers.)

But when a HOT update happens the entry in an (logically unchanged)
index still points to the original heap tid, and that line item is
updated with a pointer to the new line pointer in the same page.

Assuming I'm understanding this correctly, attached is a patch
correcting the description.

I think we want to somehow distinguish between the old tuple that is the root of the chain and old tuples that are not. This comment refers to pruning the chain and removing intermediate links in the chain that are no longer relevant because the root has been updated to point to the live tuple. In README.HOT, tuple 2 in the example after 1 points to 3.

https://github.com/postgres/postgres/blob/ec99d6e9c87a8ff0f4805cc0c6c12cbb89c48e06/src/backend/access/heap/README.HOT#L109

David J.

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

Peter Geoghegan

Date:

29 September 2023, 18:04:27

On Fri, Sep 29, 2023 at 10:39 AM James Coleman <jtc331@gmail.com> wrote:
> > Old versions of updated rows can be completely removed during normal
> > operation, including SELECTs, instead of requiring periodic vacuum
> > operations. (This is possible because indexes do not reference their page
> > item identifiers.)
>
> But when a HOT update happens the entry in an (logically unchanged)
> index still points to the original heap tid, and that line item is
> updated with a pointer to the new line pointer in the same page.

It's true that the original root heap tuple (which is never a
heap-only tuple) must have its line pointer changed from LP_NORMAL to
LP_REDIRECT the first time pruning takes place that affects its HOT
chain. But I don't think that referring to the root item as something
along the lines of "an obsolescent/old tuple's line pointer" is
particularly helpful. Changing from LP_NORMAL to LP_REDIRECT during
the initial prune isn't terribly different from changing an existing
LP_REDIRECT's redirect-TID link during every subsequent prune. In both
cases you're just updating where the first heap-only tuple begins.

The really important point is that the TID (which maps to the root
item of the HOT chain) has a decent chance of being stable over time,
no matter how many versions the HOT chain churns through. And that
that can break (or at least weaken) our dependence on VACUUM with some
workloads.

--
Peter Geoghegan

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

Peter Geoghegan

Date:

29 September 2023, 18:39:14

On Fri, Sep 29, 2023 at 11:04 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > But when a HOT update happens the entry in an (logically unchanged)
> > index still points to the original heap tid, and that line item is
> > updated with a pointer to the new line pointer in the same page.
>
> It's true that the original root heap tuple (which is never a
> heap-only tuple) must have its line pointer changed from LP_NORMAL to
> LP_REDIRECT the first time pruning takes place that affects its HOT
> chain. But I don't think that referring to the root item as something
> along the lines of "an obsolescent/old tuple's line pointer" is
> particularly helpful.

To be clear, the existing wording seems correct to me. Even heap-only
tuples require line pointers. These line pointers are strictly
guaranteed to never be *directly* referenced from indexes (if they are
then we'll actually detect it and report data corruption on recent
versions). The name "heap-only tuple" quite literally means that the
tuple and its line pointer are only represented in the heap, and never
in indexes.

There is a related rule about what is allowed to happen to any
heap-only tuple's line pointer: it can only change from LP_NORMAL to
LP_UNUSED (never LP_DEAD or LP_REDIRECT). You can think of a heap-only
tuple as "skipping the LP_DEAD step" that regular heap tuples must go
through. We don't need LP_DEAD tombstones precisely because there
cannot possibly be any references to heap-only tuples in indexes -- so
we can't break index scans by going straight from LP_NORMAL to
LP_UNUSED for heap-only tuple line pointers.

--
Peter Geoghegan

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

James Coleman

Date:

29 September 2023, 18:45:35

On Fri, Sep 29, 2023 at 2:39 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Fri, Sep 29, 2023 at 11:04 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > > But when a HOT update happens the entry in an (logically unchanged)
> > > index still points to the original heap tid, and that line item is
> > > updated with a pointer to the new line pointer in the same page.
> >
> > It's true that the original root heap tuple (which is never a
> > heap-only tuple) must have its line pointer changed from LP_NORMAL to
> > LP_REDIRECT the first time pruning takes place that affects its HOT
> > chain. But I don't think that referring to the root item as something
> > along the lines of "an obsolescent/old tuple's line pointer" is
> > particularly helpful.
>
> To be clear, the existing wording seems correct to me. Even heap-only
> tuples require line pointers. These line pointers are strictly
> guaranteed to never be *directly* referenced from indexes (if they are
> then we'll actually detect it and report data corruption on recent
> versions). The name "heap-only tuple" quite literally means that the
> tuple and its line pointer are only represented in the heap, and never
> in indexes.

Hmm, to my reading the issue is that "old versions" doesn't say
anything about "old HOT versions; it seems to be describing what
happens generally when a heap-only tuple is written -- which would
include the first time a heap-only tuple is written. And when it's the
first heap-only tuple the "old version" would be the original version,
which would not be a heap-only tuple.

I can work up a tweaked version of the patch that shows there are two
paths here (original tuple is being updated versus an intermediate
heap-only tuple is being updated); would you be willing to consider
that?

Thanks,
James Coleman

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

Peter Geoghegan

Date:

29 September 2023, 20:05:39

On Fri, Sep 29, 2023 at 11:45 AM James Coleman <jtc331@gmail.com>
wrote:my reading the issue is that "old versions" doesn't say
> anything about "old HOT versions; it seems to be describing what
> happens generally when a heap-only tuple is written -- which would
> include the first time a heap-only tuple is written.

I think that it's talking about what happens during opportunistic
pruning, in particular what happens to HOT chains. (Though pruning
does almost the same amount of useful work with non-heap-only tuples,
so it's a bit unfortunate that the name "HOT pruning" seems to have
stuck.)

> And when it's the
> first heap-only tuple the "old version" would be the original version,
> which would not be a heap-only tuple.

The docs say "Old versions of updated rows can be completely removed
during normal operation". Opportunistic pruning removes dead heap-only
tuples completely, and makes their line pointers LP_UNUSED right away.
But it can also entail removing storage for the original root item
heap tuple, and making its line pointer LP_REDIRECT right away (not
LP_DEAD or LP_UNUSED) at most once in the life of each HOT chain. So
yeah, we're not quite limited to removing storage for heap-only tuples
when pruning a HOT chain. Does that distinction really matter, though?

There isn't even any special case handling for it in pruneheap.c (we
only have assertions that make sure that we're performing "valid
transitions" for each tuple/line pointer). That is, we don't really
care about the difference between calling ItemIdSetRedirect() for an
LP_NORMAL item versus an existing LP_REDIRECT item at the code level
(we just do it and let PageRepairFragmentation() clean things up).

--
Peter Geoghegan

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

James Coleman

Date:

30 September 2023, 01:27:11

On Fri, Sep 29, 2023 at 4:06 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Fri, Sep 29, 2023 at 11:45 AM James Coleman <jtc331@gmail.com>
> wrote:my reading the issue is that "old versions" doesn't say
> > anything about "old HOT versions; it seems to be describing what
> > happens generally when a heap-only tuple is written -- which would
> > include the first time a heap-only tuple is written.
>
> I think that it's talking about what happens during opportunistic
> pruning, in particular what happens to HOT chains. (Though pruning
> does almost the same amount of useful work with non-heap-only tuples,
> so it's a bit unfortunate that the name "HOT pruning" seems to have
> stuck.)

That's very likely what the intention was. I read it again, and the
same confusion still sticks out to me: it doesn't say anything
explicitly about opportunistic pruning (I'm not sure if that term is
"public docs" level, so that's probably fine), and it doesn't scope
the claim to intermediate tuples in a HOT chain -- indeed the context
is the HOT feature generally.

This is why I discovered it: it says "indexes do not reference their
page item identifiers", which is manifestly not true when talking
about the root item, and in fact would defeat the whole purpose of HOT
(at least in a old-to-new chain like Postgres uses).

Assuming people can be convinced this is confusing (I realize you may
not be yet), I see two basic options:

1. Update this to discuss both intermediate tuples and root items
separately. This could entail either one larger paragraph or splitting
such that instead of "two optimizations" we say "three" optimizations.

2. Change "old versions" to something like "intermediate versions in a
series of updates".

I prefer some form of (1) since it more fully describes the behavior,
but we could tweak further for concision.

> > And when it's the
> > first heap-only tuple the "old version" would be the original version,
> > which would not be a heap-only tuple.
>
> The docs say "Old versions of updated rows can be completely removed
> during normal operation". Opportunistic pruning removes dead heap-only
> tuples completely, and makes their line pointers LP_UNUSED right away.
> But it can also entail removing storage for the original root item
> heap tuple, and making its line pointer LP_REDIRECT right away (not
> LP_DEAD or LP_UNUSED) at most once in the life of each HOT chain. So
> yeah, we're not quite limited to removing storage for heap-only tuples
> when pruning a HOT chain. Does that distinction really matter, though?

Given pageinspect can show you the original tuple still exists and
that the index still references it...I think it does.

I suppose very few people go checking that out, of course, but I'd
like to be precise.

Regards,
James Coleman

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

Peter Geoghegan

Date:

30 September 2023, 02:02:24

On Fri, Sep 29, 2023 at 6:27 PM James Coleman <jtc331@gmail.com> wrote:
> On Fri, Sep 29, 2023 at 4:06 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > I think that it's talking about what happens during opportunistic
> > pruning, in particular what happens to HOT chains. (Though pruning
> > does almost the same amount of useful work with non-heap-only tuples,
> > so it's a bit unfortunate that the name "HOT pruning" seems to have
> > stuck.)
>
> That's very likely what the intention was. I read it again, and the
> same confusion still sticks out to me: it doesn't say anything
> explicitly about opportunistic pruning (I'm not sure if that term is
> "public docs" level, so that's probably fine), and it doesn't scope
> the claim to intermediate tuples in a HOT chain -- indeed the context
> is the HOT feature generally.

It doesn't mention opportunistic pruning by name, but it does say:

"Old versions of updated rows can be completely removed during normal
operation, including SELECTs, instead of requiring periodic vacuum
operations."

There is a strong association between HOT and pruning (particularly
opportunistic pruning) in the minds of some hackers (and perhaps some
users), because both features appeared together in 8.3, and both are
closely related at the implementation level. It's nevertheless not
quite accurate to say that HOT "provides two optimizations" -- since
pruning (the second of the two bullet points) isn't fundamentally
different for pages that don't have any HOT chains. Not at the level
of the heap pages, at least (indexes are another matter).

Explaining these sorts of distinctions through prose is very
difficult. You really need diagrams for something like this IMV.
Without that, the only way to make all of this less confusing is to
avoid all discussion of pruning...but then you can't really make the
point about breaking the dependency on VACUUM, which is a relatively
important point -- one with real practical relevance.

> This is why I discovered it: it says "indexes do not reference their
> page item identifiers", which is manifestly not true when talking
> about the root item, and in fact would defeat the whole purpose of HOT
> (at least in a old-to-new chain like Postgres uses).

Yeah, but...that's not what was intended. Obviously, the index hasn't
changed, and we expect index scans to continue to give correct
answers. So it is pretty strongly implied that it continues to point
to something valid.

> Assuming people can be convinced this is confusing (I realize you may
> not be yet), I see two basic options:
>
> 1. Update this to discuss both intermediate tuples and root items
> separately. This could entail either one larger paragraph or splitting
> such that instead of "two optimizations" we say "three" optimizations.
>
> 2. Change "old versions" to something like "intermediate versions in a
> series of updates".
>
> I prefer some form of (1) since it more fully describes the behavior,
> but we could tweak further for concision.

Bruce authored these docs. I was mostly just glad to have anything at
all about HOT in the user-facing docs, quite honestly.

--
Peter Geoghegan

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

Robert Haas

Date:

02 October 2023, 18:55:15

On Sat, Sep 30, 2023 at 1:05 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > This is why I discovered it: it says "indexes do not reference their
> > page item identifiers", which is manifestly not true when talking
> > about the root item, and in fact would defeat the whole purpose of HOT
> > (at least in a old-to-new chain like Postgres uses).
>
> Yeah, but...that's not what was intended. Obviously, the index hasn't
> changed, and we expect index scans to continue to give correct
> answers. So it is pretty strongly implied that it continues to point
> to something valid.

I took a look at this. I agree with James that the current wording is
just plain wrong.

     periodic vacuum operations.  (This is possible because indexes
     do not reference their <link linkend="storage-page-layout">page
     item identifiers</link>.)

Here, the antecedent of "their" is "old versions of updated rows". It
is slightly unclear whether we should interpret this to mean (a) the
tuples together with the line pointers which point to them or (b) only
the tuple data to which the line pointers point. If (a), then it's
wrong because we can't actually get rid of the root line pointer;
rather, we have to change it to a redirect. If (b), then it's wrong
because heap_page_prune() removes dead tuples in this sense whether
HOT is involved or not. I can see no interpretation under which this
statement is true as written.

I reviewed James's proposed alternative:

+     periodic vacuum operations. However because indexes reference the old
+     version's <link linkend="storage-page-layout">page item identifiers</link>
+     the line pointer must remain in place. Such a line pointer has its
+     <literal>LP_REDIRECT</literal> bit set and its offset updated to the
+     <link linkend="storage-page-layout">page item identifiers</link> of
+     the updated row.

I don't think that's really right either. That's true for the root
line pointer, but the phrasing seems to be referring to old versions
generally, which would seem to include not only the root, for which
this is correct, and also all subsequent now-dead row versions, for
which it is wrong.

Here is my attempt:

When a row is updated multiple times, row versions other than the
oldest and the newest can be completely removed during normal
operation, including <command>SELECT</command>s, instead of requiring
periodic vacuum operations. (Indexes always refer to the <link
linkend="storage-page-layout">page item identifiers</link> of the
original row version. The tuple data associated with that row version
is removed, and its item identifier is converted to a redirect that
points to the oldest version that may still be visible to some
concurrent transaction. Intermediate row versions that are no longer
visible to anyone are completely removed, and the associated page item
identifiers are made available for reuse.)

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

James Coleman

Date:

03 October 2023, 19:35:15

On Mon, Oct 2, 2023 at 2:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Sep 30, 2023 at 1:05 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > > This is why I discovered it: it says "indexes do not reference their
> > > page item identifiers", which is manifestly not true when talking
> > > about the root item, and in fact would defeat the whole purpose of HOT
> > > (at least in a old-to-new chain like Postgres uses).
> >
> > Yeah, but...that's not what was intended. Obviously, the index hasn't
> > changed, and we expect index scans to continue to give correct
> > answers. So it is pretty strongly implied that it continues to point
> > to something valid.
>
> I took a look at this. I agree with James that the current wording is
> just plain wrong.
>
>      periodic vacuum operations.  (This is possible because indexes
>      do not reference their <link linkend="storage-page-layout">page
>      item identifiers</link>.)
>
> Here, the antecedent of "their" is "old versions of updated rows". It
> is slightly unclear whether we should interpret this to mean (a) the
> tuples together with the line pointers which point to them or (b) only
> the tuple data to which the line pointers point. If (a), then it's
> wrong because we can't actually get rid of the root line pointer;
> rather, we have to change it to a redirect. If (b), then it's wrong
> because heap_page_prune() removes dead tuples in this sense whether
> HOT is involved or not. I can see no interpretation under which this
> statement is true as written.
>
> I reviewed James's proposed alternative:
>
> +     periodic vacuum operations. However because indexes reference the old
> +     version's <link linkend="storage-page-layout">page item identifiers</link>
> +     the line pointer must remain in place. Such a line pointer has its
> +     <literal>LP_REDIRECT</literal> bit set and its offset updated to the
> +     <link linkend="storage-page-layout">page item identifiers</link> of
> +     the updated row.
>
> I don't think that's really right either. That's true for the root
> line pointer, but the phrasing seems to be referring to old versions
> generally, which would seem to include not only the root, for which
> this is correct, and also all subsequent now-dead row versions, for
> which it is wrong.
>
> Here is my attempt:
>
> When a row is updated multiple times, row versions other than the
> oldest and the newest can be completely removed during normal
> operation, including <command>SELECT</command>s, instead of requiring
> periodic vacuum operations. (Indexes always refer to the <link
> linkend="storage-page-layout">page item identifiers</link> of the
> original row version. The tuple data associated with that row version
> is removed, and its item identifier is converted to a redirect that
> points to the oldest version that may still be visible to some
> concurrent transaction. Intermediate row versions that are no longer
> visible to anyone are completely removed, and the associated page item
> identifiers are made available for reuse.)

Hi Robert,

Thanks for reviewing!

I like your changes. Reading through this several times, and noting
Peter's comments about pruning being more than just HOT, I'm thinking
that rather than a simple fixup for this one paragraph what we
actually want is to split out the concept of page pruning into its own
section of the storage docs. Attached is a patch that does that,
incorporating much of your language about LP_REDIRECT, along with
LP_DEAD so that readers know this affects more than just heap-only
tuple workloads.

Regards,
James

Attachment

v2-0001-Correct-HOT-docs-to-account-for-LP_REDIRECT.patch

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

Robert Haas

Date:

04 October 2023, 13:17:48

On Tue, Oct 3, 2023 at 3:35 PM James Coleman <jtc331@gmail.com> wrote:
> I like your changes. Reading through this several times, and noting
> Peter's comments about pruning being more than just HOT, I'm thinking
> that rather than a simple fixup for this one paragraph what we
> actually want is to split out the concept of page pruning into its own
> section of the storage docs. Attached is a patch that does that,
> incorporating much of your language about LP_REDIRECT, along with
> LP_DEAD so that readers know this affects more than just heap-only
> tuple workloads.

I considered this kind of approach, but I don't think it's better. I
think the point of this documentation is to give people a general idea
of how HOT works, not all the technical details. If we start getting
into the nitty gritty, we're going to have to explain a lot more
stuff, which is going to detract from the actual purpose of this
documentation. For example, your patch talks about LP_REDIRECT and
LP_DEAD, which are not referenced in any other part of the
documentation. It also uses the term line pointer instead of page item
identifier without explaining that those are basically the same thing.
Obviously those kinds of things can be fixed, but in my opinion, your
version doesn't really add much information that is likely to be
useful to a reader of this section, while at the same time it does add
some things that might be confusing. If we wanted to have a more
technical discussion of all of this somewhere in the documentation, we
could, but that would be quite a bit more work to write and review,
and it would probably duplicate some of what we've already got in
README.HOT.

I suggest that we should be looking for a patch that tries to correct
the wrong stuff in the present wording while adding the minimum
possible amount of text.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

James Coleman

Date:

04 October 2023, 13:35:55

On Wed, Oct 4, 2023 at 9:18 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Oct 3, 2023 at 3:35 PM James Coleman <jtc331@gmail.com> wrote:
> > I like your changes. Reading through this several times, and noting
> > Peter's comments about pruning being more than just HOT, I'm thinking
> > that rather than a simple fixup for this one paragraph what we
> > actually want is to split out the concept of page pruning into its own
> > section of the storage docs. Attached is a patch that does that,
> > incorporating much of your language about LP_REDIRECT, along with
> > LP_DEAD so that readers know this affects more than just heap-only
> > tuple workloads.
>
> I considered this kind of approach, but I don't think it's better. I
> think the point of this documentation is to give people a general idea
> of how HOT works, not all the technical details. If we start getting
> into the nitty gritty, we're going to have to explain a lot more
> stuff, which is going to detract from the actual purpose of this
> documentation. For example, your patch talks about LP_REDIRECT and
> LP_DEAD, which are not referenced in any other part of the
> documentation. It also uses the term line pointer instead of page item
> identifier without explaining that those are basically the same thing.
> Obviously those kinds of things can be fixed, but in my opinion, your
> version doesn't really add much information that is likely to be
> useful to a reader of this section, while at the same time it does add
> some things that might be confusing. If we wanted to have a more
> technical discussion of all of this somewhere in the documentation, we
> could, but that would be quite a bit more work to write and review,
> and it would probably duplicate some of what we've already got in
> README.HOT.
>
> I suggest that we should be looking for a patch that tries to correct
> the wrong stuff in the present wording while adding the minimum
> possible amount of text.

There's one primary thing I think this approach adds (I don't mean the
patch I proposed is the only way to address this concern): the fact
that pages get cleaned up outside of the HOT optimization. We could
add a short paragraph about that on the HOT page, but that seems a bit
confusing.

Are you thinking we should simply elide the fact that there is pruning
that happens outside of HOT? Or add that information onto the HOT
page, even though it doesn't directly fit?

Regards,
James

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

Robert Haas

Date:

04 October 2023, 13:42:43

On Wed, Oct 4, 2023 at 9:36 AM James Coleman <jtc331@gmail.com> wrote:
> Are you thinking we should simply elide the fact that there is pruning
> that happens outside of HOT? Or add that information onto the HOT
> page, even though it doesn't directly fit?

I think we should elide it. Maybe with a much larger rewrite there
would be a good place to include that information, but with the
current structure, the page is about why HOT is good, and talking
about pruning that can happen apart from HOT doesn't advance that
message.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

James Coleman

Date:

05 October 2023, 01:12:33

On Wed, Oct 4, 2023 at 9:42 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Oct 4, 2023 at 9:36 AM James Coleman <jtc331@gmail.com> wrote:
> > Are you thinking we should simply elide the fact that there is pruning
> > that happens outside of HOT? Or add that information onto the HOT
> > page, even though it doesn't directly fit?
>
> I think we should elide it. Maybe with a much larger rewrite there
> would be a good place to include that information, but with the
> current structure, the page is about why HOT is good, and talking
> about pruning that can happen apart from HOT doesn't advance that
> message.

All right, attached is a v3 which attempts to fix the wrong
information with an economy of words. I may at some point submit a
separate patch that adds a broader pruning section, but this at least
brings the docs inline with reality insofar as they address it.

James

Attachment

v3-0001-Correct-HOT-docs-to-account-for-LP_REDIRECT.patch

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

Shubham Khanna

Date:

05 December 2023, 05:22:43

On Tue, Dec 5, 2023 at 10:51 AM James Coleman <jtc331@gmail.com> wrote:
>
> Hello,
>
> While working on my talk for PGConf.NYC next week I came across this
> bullet in the docs on heap only tuples:
>
> > Old versions of updated rows can be completely removed during normal
> > operation, including SELECTs, instead of requiring periodic vacuum
> > operations. (This is possible because indexes do not reference their page
> > item identifiers.)
>
> But when a HOT update happens the entry in an (logically unchanged)
> index still points to the original heap tid, and that line item is
> updated with a pointer to the new line pointer in the same page.
>
> Assuming I'm understanding this correctly, attached is a patch
> correcting the description.
>
> I have Reviewed the patch. Patch applies neatly without any issues. Documentation build was successful and there was
noSpell-check issue also. I did not find any issues. The patch looks >good to me. 
>
>Thanks and Regards,
>Shubham Khanna.

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

Robert Haas

Date:

14 March 2024, 14:27:54

On Wed, Oct 4, 2023 at 9:12 PM James Coleman <jtc331@gmail.com> wrote:
> All right, attached is a v3 which attempts to fix the wrong
> information with an economy of words. I may at some point submit a
> separate patch that adds a broader pruning section, but this at least
> brings the docs inline with reality insofar as they address it.

I don't think this is as good as what I proposed back on October 2nd.
IMHO, that version does a good job making the text accurate and clear,
and is directly responsive to your original complaint, namely, that
the root of the HOT chain can't be removed. But this version seems to
contain a number of incidental changes that are unrelated to that
point, e.g. "old versions" -> "old, no longer visible versions", "can
be completely removed" -> "may be pruned", and the removal of the
sentence "In summary, heap-only tuple updates can only be created - if
columns used by indexes are not updated" which AFAICT is both
completely correct as-is and unrelated to the original complaint.

Maybe I shouldn't be, but I'm slightly frustrated here. I thought I
had proposed an alternative which you found acceptable, but then you
proposed several more versions that did other things instead, and I
never really understood why we couldn't just adopt the version that
you seemed to think was OK. If there's a problem with that, say what
it is. If there's not, let's do that and move on.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

James Coleman

Date:

15 March 2024, 01:06:54

On Thu, Mar 14, 2024 at 10:28 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Oct 4, 2023 at 9:12 PM James Coleman <jtc331@gmail.com> wrote:
> > All right, attached is a v3 which attempts to fix the wrong
> > information with an economy of words. I may at some point submit a
> > separate patch that adds a broader pruning section, but this at least
> > brings the docs inline with reality insofar as they address it.
>
> I don't think this is as good as what I proposed back on October 2nd.
> IMHO, that version does a good job making the text accurate and clear,
> and is directly responsive to your original complaint, namely, that
> the root of the HOT chain can't be removed. But this version seems to
> contain a number of incidental changes that are unrelated to that
> point, e.g. "old versions" -> "old, no longer visible versions", "can
> be completely removed" -> "may be pruned", and the removal of the
> sentence "In summary, heap-only tuple updates can only be created - if
> columns used by indexes are not updated" which AFAICT is both
> completely correct as-is and unrelated to the original complaint.
>
> Maybe I shouldn't be, but I'm slightly frustrated here. I thought I
> had proposed an alternative which you found acceptable, but then you
> proposed several more versions that did other things instead, and I
> never really understood why we couldn't just adopt the version that
> you seemed to think was OK. If there's a problem with that, say what
> it is. If there's not, let's do that and move on.

I think there's simply a misunderstanding here. I read your proposal
as "here's an idea to consider as you work on the patch" (as happens
on many other threads), and so I attempted to incorporate your primary
points of feedback into my next version of the patch.

Obviously I have reasons for the other changes I made: for example,
"no longer visible" improves the correctness, since being an old
version isn't sufficient. I removed the "In summary" sentence because
it simply doesn't follow from the prose before it. That sentence
simply restates information already appearing earlier in almost as
simple a form, so it's redundant. But more importantly it's just not
actually a summary of the text before it, so removing it improves the
documentation.

I can explain my reasoning further if desired, but I fear it would
simply frustrate you further, so I'll stop here.

If the goal here is the most minimal patch possible, then please
commit what you proposed. I am interested in improving the document
further, but I don't know how to do that easily if the requirement is
effectively "must only change one specific detail at a time". So, that
leaves me feeling a bit frustrated also.

Regards,
James Coleman

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

Robert Haas

Date:

20 March 2024, 18:15:12

On Thu, Mar 14, 2024 at 9:07 PM James Coleman <jtc331@gmail.com> wrote:
> Obviously I have reasons for the other changes I made: for example,
> "no longer visible" improves the correctness, since being an old
> version isn't sufficient. I removed the "In summary" sentence because
> it simply doesn't follow from the prose before it. That sentence
> simply restates information already appearing earlier in almost as
> simple a form, so it's redundant. But more importantly it's just not
> actually a summary of the text before it, so removing it improves the
> documentation.
>
> I can explain my reasoning further if desired, but I fear it would
> simply frustrate you further, so I'll stop here.
>
> If the goal here is the most minimal patch possible, then please
> commit what you proposed. I am interested in improving the document
> further, but I don't know how to do that easily if the requirement is
> effectively "must only change one specific detail at a time". So, that
> leaves me feeling a bit frustrated also.

I don't think the goal is to have the most minimal patch possible, but
at the same time, I'm responsible for what I commit. If I commit a
patch that changes something and somebody, especially another
committer, writes an email and says "hey, why the heck did you change
this, you dummy?" then I need to be able to answer that question, and
saying "uh, well, James Coleman had it in his patch and he didn't
explain why it was there and I didn't know either but I just kind of
committed it anyway" is not where I want to be. Almost without
exception, it's not the patch author who gets asked why they included
a certain thing in the patch; it's the committer who gets asked why it
got committed that way -- and at least as I see it, if there's not a
good answer to that question, it's the committer who gets judged, not
the patch author. In some cases, such questions and judgements arrive
without warning YEARS after the commit.

So, to protect myself against questions from other hackers that end,
at least implicitly, in "you dummy," I try to make sure there's an
adequate paper trail for everything I commit. Between the commit
message, the comments, and the email discussion, it needs to be
crystal clear why something was thought to be a good idea at the time.
Hopefully, it will be clear enough that I never even get a question,
because the reader will be able to understand ON THEIR OWN why
something was done and either go "oh, that make sense" or "well, I get
why they did that, but I disagree because $REASON" ... but if that
doesn't quite work out, then I hope that at the very least the paper
trail will be good enough that I can reconstruct the reasoning if
needed. For a recent example of a case where I clearly failed do a
good enough job to keep someone from asking a "you dummy" question,
see the http://postgr.es/m/6030bdec-0de1-4da2-b0b3-335007eae97f@iki.fi
and in particular the paragraph that starts with "However".

Therefore, if I realize when reading the patch that it contains
odd-looking changes which I can't relate to the stated goal, it's
getting bounced, pretty much 100% of the time. If it's a small patch
and I really care about it and it's a new submitter who doesn't know
any better, I might choose to remove those changes myself and commit
the rest; and if I like those changes for other reasons, I might
choose to commit them separately. In any other situation, I'm just
going to tell the submitter that they need to take out the unrelated
changes and move on to the next email thread.

In this particular situation, I see that this whole discussion is
slightly moot by virtue of 7a9328e8e40534fb4de41b4ac152e3c6989d84e7
(Jeff Davis, 2024-03-05) which removed the "In summary, heap-only
tuple updates can only be created if columns used by indexes are not
updated" sentence that you also removed. But there is an important
difference between that patch and yours, at least IMHO. You argue that
the sentence in question didn't flow from what precedes it, but I
don't agree with that as a rationale; I think that sentence flows
perfectly well from what precedes it and is a generally adequate
summary of the point of the section. That we disagree on the merits of
that sentence is fine; there's no hope that we're all going to agree
on everything. What makes me frustrated is that you didn't provide
that rationale, which greatly complicates the whole discussion,
because now I have to spend time getting you to either take it out or
provide your justification before we can even reach the point of
arguing about the substance.

At least in my judgment, the patch Jeff committed doesn't have that
problem, because the rationale that he gives in the commit message
(partly by reference to another commit) is that heap-only tuples now
can, in circumstances, be created even if columns used by indexes ARE
updated. Based on that justification, it seems really clear to me that
it was right to do SOMETHING to the sentence in question. Whether
removing the sentence was better than some other alternative is
debatable, and I might well have done something different, but between
reading the commit message and reading the diff, it's 100%
understandable to me what his reasoning was for every single byte that
is in that diff. So if I love what he committed, cool! And if I hate
what he committed, I'm well-placed to argue, because I know what I'm
arguing against.

Also, if I do feel like arguing, the fact that I know his reasoning
will also make it a heck of a lot easier to come up with a proposal
that meets my goals without undermining his. I can see, for example,
that I can't simply propose to put that sentence back the way that it
was; based on his reason for removing it, he obviously isn't going to
like that. But if, say, my goal were to have some kind of a summary
sentence there because I just thought that was really helpful, I could
write a patch to insert a new summary sentence there and then say
"hey, Jeff, you took this summary sentence out because it's not
accurate, but here's a new one which I think is accurate and I'd like
to add it back." Maybe he'd like that, and maybe he wouldn't, but it
would have a shot, because it would take his reasoning into account,
which I can do, because I know what it was. I want people who read the
patches that I commit to have the same opportunities.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

Robert Haas

Date:

21 March 2024, 17:09:24

On Thu, Mar 14, 2024 at 9:07 PM James Coleman <jtc331@gmail.com> wrote:
> If the goal here is the most minimal patch possible, then please
> commit what you proposed. I am interested in improving the document
> further, but I don't know how to do that easily if the requirement is
> effectively "must only change one specific detail at a time".

So, yesterday I wrote a long email on how I saw the goals here.
Despite our disagreements, I believe we agree that the text I proposed
is better than what's there, so I've committed that change now. I've
also marked the CF entry as committed. Please propose the other
changes you want separately.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

James Coleman

Date:

23 March 2024, 21:42:39

On Wed, Mar 20, 2024 at 2:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Mar 14, 2024 at 9:07 PM James Coleman <jtc331@gmail.com> wrote:
> > Obviously I have reasons for the other changes I made: for example,
> > "no longer visible" improves the correctness, since being an old
> > version isn't sufficient. I removed the "In summary" sentence because
> > it simply doesn't follow from the prose before it. That sentence
> > simply restates information already appearing earlier in almost as
> > simple a form, so it's redundant. But more importantly it's just not
> > actually a summary of the text before it, so removing it improves the
> > documentation.
> >
> > I can explain my reasoning further if desired, but I fear it would
> > simply frustrate you further, so I'll stop here.
> >
> > If the goal here is the most minimal patch possible, then please
> > commit what you proposed. I am interested in improving the document
> > further, but I don't know how to do that easily if the requirement is
> > effectively "must only change one specific detail at a time". So, that
> > leaves me feeling a bit frustrated also.
>
> I don't think the goal is to have the most minimal patch possible, but
> at the same time, I'm responsible for what I commit. If I commit a
> patch that changes something and somebody, especially another
> committer, writes an email and says "hey, why the heck did you change
> this, you dummy?" then I need to be able to answer that question, and
> saying "uh, well, James Coleman had it in his patch and he didn't
> explain why it was there and I didn't know either but I just kind of
> committed it anyway" is not where I want to be. Almost without
> exception, it's not the patch author who gets asked why they included
> a certain thing in the patch; it's the committer who gets asked why it
> got committed that way -- and at least as I see it, if there's not a
> good answer to that question, it's the committer who gets judged, not
> the patch author. In some cases, such questions and judgements arrive
> without warning YEARS after the commit.
>
> So, to protect myself against questions from other hackers that end,
> at least implicitly, in "you dummy," I try to make sure there's an
> adequate paper trail for everything I commit. Between the commit
> message, the comments, and the email discussion, it needs to be
> crystal clear why something was thought to be a good idea at the time.
> Hopefully, it will be clear enough that I never even get a question,
> because the reader will be able to understand ON THEIR OWN why
> something was done and either go "oh, that make sense" or "well, I get
> why they did that, but I disagree because $REASON" ... but if that
> doesn't quite work out, then I hope that at the very least the paper
> trail will be good enough that I can reconstruct the reasoning if
> needed. For a recent example of a case where I clearly failed do a
> good enough job to keep someone from asking a "you dummy" question,
> see the http://postgr.es/m/6030bdec-0de1-4da2-b0b3-335007eae97f@iki.fi
> and in particular the paragraph that starts with "However".
>
> Therefore, if I realize when reading the patch that it contains
> odd-looking changes which I can't relate to the stated goal, it's
> getting bounced, pretty much 100% of the time. If it's a small patch
> and I really care about it and it's a new submitter who doesn't know
> any better, I might choose to remove those changes myself and commit
> the rest; and if I like those changes for other reasons, I might
> choose to commit them separately. In any other situation, I'm just
> going to tell the submitter that they need to take out the unrelated
> changes and move on to the next email thread.
>
> In this particular situation, I see that this whole discussion is
> slightly moot by virtue of 7a9328e8e40534fb4de41b4ac152e3c6989d84e7
> (Jeff Davis, 2024-03-05) which removed the "In summary, heap-only
> tuple updates can only be created if columns used by indexes are not
> updated" sentence that you also removed. But there is an important
> difference between that patch and yours, at least IMHO. You argue that
> the sentence in question didn't flow from what precedes it, but I
> don't agree with that as a rationale; I think that sentence flows
> perfectly well from what precedes it and is a generally adequate
> summary of the point of the section. That we disagree on the merits of
> that sentence is fine; there's no hope that we're all going to agree
> on everything. What makes me frustrated is that you didn't provide
> that rationale, which greatly complicates the whole discussion,
> because now I have to spend time getting you to either take it out or
> provide your justification before we can even reach the point of
> arguing about the substance.
>
> At least in my judgment, the patch Jeff committed doesn't have that
> problem, because the rationale that he gives in the commit message
> (partly by reference to another commit) is that heap-only tuples now
> can, in circumstances, be created even if columns used by indexes ARE
> updated. Based on that justification, it seems really clear to me that
> it was right to do SOMETHING to the sentence in question. Whether
> removing the sentence was better than some other alternative is
> debatable, and I might well have done something different, but between
> reading the commit message and reading the diff, it's 100%
> understandable to me what his reasoning was for every single byte that
> is in that diff. So if I love what he committed, cool! And if I hate
> what he committed, I'm well-placed to argue, because I know what I'm
> arguing against.
>
> Also, if I do feel like arguing, the fact that I know his reasoning
> will also make it a heck of a lot easier to come up with a proposal
> that meets my goals without undermining his. I can see, for example,
> that I can't simply propose to put that sentence back the way that it
> was; based on his reason for removing it, he obviously isn't going to
> like that. But if, say, my goal were to have some kind of a summary
> sentence there because I just thought that was really helpful, I could
> write a patch to insert a new summary sentence there and then say
> "hey, Jeff, you took this summary sentence out because it's not
> accurate, but here's a new one which I think is accurate and I'd like
> to add it back." Maybe he'd like that, and maybe he wouldn't, but it
> would have a shot, because it would take his reasoning into account,
> which I can do, because I know what it was. I want people who read the
> patches that I commit to have the same opportunities.

I'm happy to provide the rationale; my apologies for not providing it
originally.

I didn't realize that was the root of the frustration: I'd read your
feedback ("[my proposed change[ is directly responsive to your
original complaint...[your] version seems to contain a number of
incidental changes that are unrelated to that") as being aimed at
something like "a patch can only address the original specific
complaint".

I'll try to be more specific in my reasoning when e.g. I think a piece
of documentation reads better with a few additional changes. If I
forget to do that I'd also appreciate your asking up front for the
rationale so that I have a chance to clarify that without
misunderstanding.

Regards,
James Coleman

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

From

James Coleman

Date:

23 March 2024, 21:43:07

On Thu, Mar 21, 2024 at 1:09 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Mar 14, 2024 at 9:07 PM James Coleman <jtc331@gmail.com> wrote:
> > If the goal here is the most minimal patch possible, then please
> > commit what you proposed. I am interested in improving the document
> > further, but I don't know how to do that easily if the requirement is
> > effectively "must only change one specific detail at a time".
>
> So, yesterday I wrote a long email on how I saw the goals here.
> Despite our disagreements, I believe we agree that the text I proposed
> is better than what's there, so I've committed that change now. I've
> also marked the CF entry as committed. Please propose the other
> changes you want separately.

Thanks for committing the fix.

Regards,
James Coleman