Thread: [DOCS] HOT - correct claim about indexes not referencing old line pointers
Hello, While working on my talk for PGConf.NYC next week I came across this bullet in the docs on heap only tuples: > Old versions of updated rows can be completely removed during normal > operation, including SELECTs, instead of requiring periodic vacuum > operations. (This is possible because indexes do not reference their page > item identifiers.) But when a HOT update happens the entry in an (logically unchanged) index still points to the original heap tid, and that line item is updated with a pointer to the new line pointer in the same page. Assuming I'm understanding this correctly, attached is a patch correcting the description. Thanks, James Coleman
Attachment
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
"David G. Johnston"
Date:
On Fri, Sep 29, 2023 at 10:45 AM James Coleman <jtc331@gmail.com> wrote:
Hello,
While working on my talk for PGConf.NYC next week I came across this
bullet in the docs on heap only tuples:
> Old versions of updated rows can be completely removed during normal
> operation, including SELECTs, instead of requiring periodic vacuum
> operations. (This is possible because indexes do not reference their page
> item identifiers.)
But when a HOT update happens the entry in an (logically unchanged)
index still points to the original heap tid, and that line item is
updated with a pointer to the new line pointer in the same page.
Assuming I'm understanding this correctly, attached is a patch
correcting the description.
I think we want to somehow distinguish between the old tuple that is the root of the chain and old tuples that are not. This comment refers to pruning the chain and removing intermediate links in the chain that are no longer relevant because the root has been updated to point to the live tuple. In README.HOT, tuple 2 in the example after 1 points to 3.
David J.
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
Peter Geoghegan
Date:
On Fri, Sep 29, 2023 at 10:39 AM James Coleman <jtc331@gmail.com> wrote: > > Old versions of updated rows can be completely removed during normal > > operation, including SELECTs, instead of requiring periodic vacuum > > operations. (This is possible because indexes do not reference their page > > item identifiers.) > > But when a HOT update happens the entry in an (logically unchanged) > index still points to the original heap tid, and that line item is > updated with a pointer to the new line pointer in the same page. It's true that the original root heap tuple (which is never a heap-only tuple) must have its line pointer changed from LP_NORMAL to LP_REDIRECT the first time pruning takes place that affects its HOT chain. But I don't think that referring to the root item as something along the lines of "an obsolescent/old tuple's line pointer" is particularly helpful. Changing from LP_NORMAL to LP_REDIRECT during the initial prune isn't terribly different from changing an existing LP_REDIRECT's redirect-TID link during every subsequent prune. In both cases you're just updating where the first heap-only tuple begins. The really important point is that the TID (which maps to the root item of the HOT chain) has a decent chance of being stable over time, no matter how many versions the HOT chain churns through. And that that can break (or at least weaken) our dependence on VACUUM with some workloads. -- Peter Geoghegan
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
Peter Geoghegan
Date:
On Fri, Sep 29, 2023 at 11:04 AM Peter Geoghegan <pg@bowt.ie> wrote: > > But when a HOT update happens the entry in an (logically unchanged) > > index still points to the original heap tid, and that line item is > > updated with a pointer to the new line pointer in the same page. > > It's true that the original root heap tuple (which is never a > heap-only tuple) must have its line pointer changed from LP_NORMAL to > LP_REDIRECT the first time pruning takes place that affects its HOT > chain. But I don't think that referring to the root item as something > along the lines of "an obsolescent/old tuple's line pointer" is > particularly helpful. To be clear, the existing wording seems correct to me. Even heap-only tuples require line pointers. These line pointers are strictly guaranteed to never be *directly* referenced from indexes (if they are then we'll actually detect it and report data corruption on recent versions). The name "heap-only tuple" quite literally means that the tuple and its line pointer are only represented in the heap, and never in indexes. There is a related rule about what is allowed to happen to any heap-only tuple's line pointer: it can only change from LP_NORMAL to LP_UNUSED (never LP_DEAD or LP_REDIRECT). You can think of a heap-only tuple as "skipping the LP_DEAD step" that regular heap tuples must go through. We don't need LP_DEAD tombstones precisely because there cannot possibly be any references to heap-only tuples in indexes -- so we can't break index scans by going straight from LP_NORMAL to LP_UNUSED for heap-only tuple line pointers. -- Peter Geoghegan
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
James Coleman
Date:
On Fri, Sep 29, 2023 at 2:39 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Fri, Sep 29, 2023 at 11:04 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > But when a HOT update happens the entry in an (logically unchanged) > > > index still points to the original heap tid, and that line item is > > > updated with a pointer to the new line pointer in the same page. > > > > It's true that the original root heap tuple (which is never a > > heap-only tuple) must have its line pointer changed from LP_NORMAL to > > LP_REDIRECT the first time pruning takes place that affects its HOT > > chain. But I don't think that referring to the root item as something > > along the lines of "an obsolescent/old tuple's line pointer" is > > particularly helpful. > > To be clear, the existing wording seems correct to me. Even heap-only > tuples require line pointers. These line pointers are strictly > guaranteed to never be *directly* referenced from indexes (if they are > then we'll actually detect it and report data corruption on recent > versions). The name "heap-only tuple" quite literally means that the > tuple and its line pointer are only represented in the heap, and never > in indexes. Hmm, to my reading the issue is that "old versions" doesn't say anything about "old HOT versions; it seems to be describing what happens generally when a heap-only tuple is written -- which would include the first time a heap-only tuple is written. And when it's the first heap-only tuple the "old version" would be the original version, which would not be a heap-only tuple. I can work up a tweaked version of the patch that shows there are two paths here (original tuple is being updated versus an intermediate heap-only tuple is being updated); would you be willing to consider that? Thanks, James Coleman
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
Peter Geoghegan
Date:
On Fri, Sep 29, 2023 at 11:45 AM James Coleman <jtc331@gmail.com> wrote:my reading the issue is that "old versions" doesn't say > anything about "old HOT versions; it seems to be describing what > happens generally when a heap-only tuple is written -- which would > include the first time a heap-only tuple is written. I think that it's talking about what happens during opportunistic pruning, in particular what happens to HOT chains. (Though pruning does almost the same amount of useful work with non-heap-only tuples, so it's a bit unfortunate that the name "HOT pruning" seems to have stuck.) > And when it's the > first heap-only tuple the "old version" would be the original version, > which would not be a heap-only tuple. The docs say "Old versions of updated rows can be completely removed during normal operation". Opportunistic pruning removes dead heap-only tuples completely, and makes their line pointers LP_UNUSED right away. But it can also entail removing storage for the original root item heap tuple, and making its line pointer LP_REDIRECT right away (not LP_DEAD or LP_UNUSED) at most once in the life of each HOT chain. So yeah, we're not quite limited to removing storage for heap-only tuples when pruning a HOT chain. Does that distinction really matter, though? There isn't even any special case handling for it in pruneheap.c (we only have assertions that make sure that we're performing "valid transitions" for each tuple/line pointer). That is, we don't really care about the difference between calling ItemIdSetRedirect() for an LP_NORMAL item versus an existing LP_REDIRECT item at the code level (we just do it and let PageRepairFragmentation() clean things up). -- Peter Geoghegan
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
James Coleman
Date:
On Fri, Sep 29, 2023 at 4:06 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Fri, Sep 29, 2023 at 11:45 AM James Coleman <jtc331@gmail.com> > wrote:my reading the issue is that "old versions" doesn't say > > anything about "old HOT versions; it seems to be describing what > > happens generally when a heap-only tuple is written -- which would > > include the first time a heap-only tuple is written. > > I think that it's talking about what happens during opportunistic > pruning, in particular what happens to HOT chains. (Though pruning > does almost the same amount of useful work with non-heap-only tuples, > so it's a bit unfortunate that the name "HOT pruning" seems to have > stuck.) That's very likely what the intention was. I read it again, and the same confusion still sticks out to me: it doesn't say anything explicitly about opportunistic pruning (I'm not sure if that term is "public docs" level, so that's probably fine), and it doesn't scope the claim to intermediate tuples in a HOT chain -- indeed the context is the HOT feature generally. This is why I discovered it: it says "indexes do not reference their page item identifiers", which is manifestly not true when talking about the root item, and in fact would defeat the whole purpose of HOT (at least in a old-to-new chain like Postgres uses). Assuming people can be convinced this is confusing (I realize you may not be yet), I see two basic options: 1. Update this to discuss both intermediate tuples and root items separately. This could entail either one larger paragraph or splitting such that instead of "two optimizations" we say "three" optimizations. 2. Change "old versions" to something like "intermediate versions in a series of updates". I prefer some form of (1) since it more fully describes the behavior, but we could tweak further for concision. > > And when it's the > > first heap-only tuple the "old version" would be the original version, > > which would not be a heap-only tuple. > > The docs say "Old versions of updated rows can be completely removed > during normal operation". Opportunistic pruning removes dead heap-only > tuples completely, and makes their line pointers LP_UNUSED right away. > But it can also entail removing storage for the original root item > heap tuple, and making its line pointer LP_REDIRECT right away (not > LP_DEAD or LP_UNUSED) at most once in the life of each HOT chain. So > yeah, we're not quite limited to removing storage for heap-only tuples > when pruning a HOT chain. Does that distinction really matter, though? Given pageinspect can show you the original tuple still exists and that the index still references it...I think it does. I suppose very few people go checking that out, of course, but I'd like to be precise. Regards, James Coleman
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
Peter Geoghegan
Date:
On Fri, Sep 29, 2023 at 6:27 PM James Coleman <jtc331@gmail.com> wrote: > On Fri, Sep 29, 2023 at 4:06 PM Peter Geoghegan <pg@bowt.ie> wrote: > > I think that it's talking about what happens during opportunistic > > pruning, in particular what happens to HOT chains. (Though pruning > > does almost the same amount of useful work with non-heap-only tuples, > > so it's a bit unfortunate that the name "HOT pruning" seems to have > > stuck.) > > That's very likely what the intention was. I read it again, and the > same confusion still sticks out to me: it doesn't say anything > explicitly about opportunistic pruning (I'm not sure if that term is > "public docs" level, so that's probably fine), and it doesn't scope > the claim to intermediate tuples in a HOT chain -- indeed the context > is the HOT feature generally. It doesn't mention opportunistic pruning by name, but it does say: "Old versions of updated rows can be completely removed during normal operation, including SELECTs, instead of requiring periodic vacuum operations." There is a strong association between HOT and pruning (particularly opportunistic pruning) in the minds of some hackers (and perhaps some users), because both features appeared together in 8.3, and both are closely related at the implementation level. It's nevertheless not quite accurate to say that HOT "provides two optimizations" -- since pruning (the second of the two bullet points) isn't fundamentally different for pages that don't have any HOT chains. Not at the level of the heap pages, at least (indexes are another matter). Explaining these sorts of distinctions through prose is very difficult. You really need diagrams for something like this IMV. Without that, the only way to make all of this less confusing is to avoid all discussion of pruning...but then you can't really make the point about breaking the dependency on VACUUM, which is a relatively important point -- one with real practical relevance. > This is why I discovered it: it says "indexes do not reference their > page item identifiers", which is manifestly not true when talking > about the root item, and in fact would defeat the whole purpose of HOT > (at least in a old-to-new chain like Postgres uses). Yeah, but...that's not what was intended. Obviously, the index hasn't changed, and we expect index scans to continue to give correct answers. So it is pretty strongly implied that it continues to point to something valid. > Assuming people can be convinced this is confusing (I realize you may > not be yet), I see two basic options: > > 1. Update this to discuss both intermediate tuples and root items > separately. This could entail either one larger paragraph or splitting > such that instead of "two optimizations" we say "three" optimizations. > > 2. Change "old versions" to something like "intermediate versions in a > series of updates". > > I prefer some form of (1) since it more fully describes the behavior, > but we could tweak further for concision. Bruce authored these docs. I was mostly just glad to have anything at all about HOT in the user-facing docs, quite honestly. -- Peter Geoghegan
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
Robert Haas
Date:
On Sat, Sep 30, 2023 at 1:05 AM Peter Geoghegan <pg@bowt.ie> wrote: > > This is why I discovered it: it says "indexes do not reference their > > page item identifiers", which is manifestly not true when talking > > about the root item, and in fact would defeat the whole purpose of HOT > > (at least in a old-to-new chain like Postgres uses). > > Yeah, but...that's not what was intended. Obviously, the index hasn't > changed, and we expect index scans to continue to give correct > answers. So it is pretty strongly implied that it continues to point > to something valid. I took a look at this. I agree with James that the current wording is just plain wrong. periodic vacuum operations. (This is possible because indexes do not reference their <link linkend="storage-page-layout">page item identifiers</link>.) Here, the antecedent of "their" is "old versions of updated rows". It is slightly unclear whether we should interpret this to mean (a) the tuples together with the line pointers which point to them or (b) only the tuple data to which the line pointers point. If (a), then it's wrong because we can't actually get rid of the root line pointer; rather, we have to change it to a redirect. If (b), then it's wrong because heap_page_prune() removes dead tuples in this sense whether HOT is involved or not. I can see no interpretation under which this statement is true as written. I reviewed James's proposed alternative: + periodic vacuum operations. However because indexes reference the old + version's <link linkend="storage-page-layout">page item identifiers</link> + the line pointer must remain in place. Such a line pointer has its + <literal>LP_REDIRECT</literal> bit set and its offset updated to the + <link linkend="storage-page-layout">page item identifiers</link> of + the updated row. I don't think that's really right either. That's true for the root line pointer, but the phrasing seems to be referring to old versions generally, which would seem to include not only the root, for which this is correct, and also all subsequent now-dead row versions, for which it is wrong. Here is my attempt: When a row is updated multiple times, row versions other than the oldest and the newest can be completely removed during normal operation, including <command>SELECT</command>s, instead of requiring periodic vacuum operations. (Indexes always refer to the <link linkend="storage-page-layout">page item identifiers</link> of the original row version. The tuple data associated with that row version is removed, and its item identifier is converted to a redirect that points to the oldest version that may still be visible to some concurrent transaction. Intermediate row versions that are no longer visible to anyone are completely removed, and the associated page item identifiers are made available for reuse.) -- Robert Haas EDB: http://www.enterprisedb.com
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
James Coleman
Date:
On Mon, Oct 2, 2023 at 2:55 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Sat, Sep 30, 2023 at 1:05 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > This is why I discovered it: it says "indexes do not reference their > > > page item identifiers", which is manifestly not true when talking > > > about the root item, and in fact would defeat the whole purpose of HOT > > > (at least in a old-to-new chain like Postgres uses). > > > > Yeah, but...that's not what was intended. Obviously, the index hasn't > > changed, and we expect index scans to continue to give correct > > answers. So it is pretty strongly implied that it continues to point > > to something valid. > > I took a look at this. I agree with James that the current wording is > just plain wrong. > > periodic vacuum operations. (This is possible because indexes > do not reference their <link linkend="storage-page-layout">page > item identifiers</link>.) > > Here, the antecedent of "their" is "old versions of updated rows". It > is slightly unclear whether we should interpret this to mean (a) the > tuples together with the line pointers which point to them or (b) only > the tuple data to which the line pointers point. If (a), then it's > wrong because we can't actually get rid of the root line pointer; > rather, we have to change it to a redirect. If (b), then it's wrong > because heap_page_prune() removes dead tuples in this sense whether > HOT is involved or not. I can see no interpretation under which this > statement is true as written. > > I reviewed James's proposed alternative: > > + periodic vacuum operations. However because indexes reference the old > + version's <link linkend="storage-page-layout">page item identifiers</link> > + the line pointer must remain in place. Such a line pointer has its > + <literal>LP_REDIRECT</literal> bit set and its offset updated to the > + <link linkend="storage-page-layout">page item identifiers</link> of > + the updated row. > > I don't think that's really right either. That's true for the root > line pointer, but the phrasing seems to be referring to old versions > generally, which would seem to include not only the root, for which > this is correct, and also all subsequent now-dead row versions, for > which it is wrong. > > Here is my attempt: > > When a row is updated multiple times, row versions other than the > oldest and the newest can be completely removed during normal > operation, including <command>SELECT</command>s, instead of requiring > periodic vacuum operations. (Indexes always refer to the <link > linkend="storage-page-layout">page item identifiers</link> of the > original row version. The tuple data associated with that row version > is removed, and its item identifier is converted to a redirect that > points to the oldest version that may still be visible to some > concurrent transaction. Intermediate row versions that are no longer > visible to anyone are completely removed, and the associated page item > identifiers are made available for reuse.) Hi Robert, Thanks for reviewing! I like your changes. Reading through this several times, and noting Peter's comments about pruning being more than just HOT, I'm thinking that rather than a simple fixup for this one paragraph what we actually want is to split out the concept of page pruning into its own section of the storage docs. Attached is a patch that does that, incorporating much of your language about LP_REDIRECT, along with LP_DEAD so that readers know this affects more than just heap-only tuple workloads. Regards, James
Attachment
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
Robert Haas
Date:
On Tue, Oct 3, 2023 at 3:35 PM James Coleman <jtc331@gmail.com> wrote: > I like your changes. Reading through this several times, and noting > Peter's comments about pruning being more than just HOT, I'm thinking > that rather than a simple fixup for this one paragraph what we > actually want is to split out the concept of page pruning into its own > section of the storage docs. Attached is a patch that does that, > incorporating much of your language about LP_REDIRECT, along with > LP_DEAD so that readers know this affects more than just heap-only > tuple workloads. I considered this kind of approach, but I don't think it's better. I think the point of this documentation is to give people a general idea of how HOT works, not all the technical details. If we start getting into the nitty gritty, we're going to have to explain a lot more stuff, which is going to detract from the actual purpose of this documentation. For example, your patch talks about LP_REDIRECT and LP_DEAD, which are not referenced in any other part of the documentation. It also uses the term line pointer instead of page item identifier without explaining that those are basically the same thing. Obviously those kinds of things can be fixed, but in my opinion, your version doesn't really add much information that is likely to be useful to a reader of this section, while at the same time it does add some things that might be confusing. If we wanted to have a more technical discussion of all of this somewhere in the documentation, we could, but that would be quite a bit more work to write and review, and it would probably duplicate some of what we've already got in README.HOT. I suggest that we should be looking for a patch that tries to correct the wrong stuff in the present wording while adding the minimum possible amount of text. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
James Coleman
Date:
On Wed, Oct 4, 2023 at 9:18 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Oct 3, 2023 at 3:35 PM James Coleman <jtc331@gmail.com> wrote: > > I like your changes. Reading through this several times, and noting > > Peter's comments about pruning being more than just HOT, I'm thinking > > that rather than a simple fixup for this one paragraph what we > > actually want is to split out the concept of page pruning into its own > > section of the storage docs. Attached is a patch that does that, > > incorporating much of your language about LP_REDIRECT, along with > > LP_DEAD so that readers know this affects more than just heap-only > > tuple workloads. > > I considered this kind of approach, but I don't think it's better. I > think the point of this documentation is to give people a general idea > of how HOT works, not all the technical details. If we start getting > into the nitty gritty, we're going to have to explain a lot more > stuff, which is going to detract from the actual purpose of this > documentation. For example, your patch talks about LP_REDIRECT and > LP_DEAD, which are not referenced in any other part of the > documentation. It also uses the term line pointer instead of page item > identifier without explaining that those are basically the same thing. > Obviously those kinds of things can be fixed, but in my opinion, your > version doesn't really add much information that is likely to be > useful to a reader of this section, while at the same time it does add > some things that might be confusing. If we wanted to have a more > technical discussion of all of this somewhere in the documentation, we > could, but that would be quite a bit more work to write and review, > and it would probably duplicate some of what we've already got in > README.HOT. > > I suggest that we should be looking for a patch that tries to correct > the wrong stuff in the present wording while adding the minimum > possible amount of text. There's one primary thing I think this approach adds (I don't mean the patch I proposed is the only way to address this concern): the fact that pages get cleaned up outside of the HOT optimization. We could add a short paragraph about that on the HOT page, but that seems a bit confusing. Are you thinking we should simply elide the fact that there is pruning that happens outside of HOT? Or add that information onto the HOT page, even though it doesn't directly fit? Regards, James
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
Robert Haas
Date:
On Wed, Oct 4, 2023 at 9:36 AM James Coleman <jtc331@gmail.com> wrote: > Are you thinking we should simply elide the fact that there is pruning > that happens outside of HOT? Or add that information onto the HOT > page, even though it doesn't directly fit? I think we should elide it. Maybe with a much larger rewrite there would be a good place to include that information, but with the current structure, the page is about why HOT is good, and talking about pruning that can happen apart from HOT doesn't advance that message. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
James Coleman
Date:
On Wed, Oct 4, 2023 at 9:42 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Oct 4, 2023 at 9:36 AM James Coleman <jtc331@gmail.com> wrote: > > Are you thinking we should simply elide the fact that there is pruning > > that happens outside of HOT? Or add that information onto the HOT > > page, even though it doesn't directly fit? > > I think we should elide it. Maybe with a much larger rewrite there > would be a good place to include that information, but with the > current structure, the page is about why HOT is good, and talking > about pruning that can happen apart from HOT doesn't advance that > message. All right, attached is a v3 which attempts to fix the wrong information with an economy of words. I may at some point submit a separate patch that adds a broader pruning section, but this at least brings the docs inline with reality insofar as they address it. James
Attachment
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
Shubham Khanna
Date:
On Tue, Dec 5, 2023 at 10:51 AM James Coleman <jtc331@gmail.com> wrote: > > Hello, > > While working on my talk for PGConf.NYC next week I came across this > bullet in the docs on heap only tuples: > > > Old versions of updated rows can be completely removed during normal > > operation, including SELECTs, instead of requiring periodic vacuum > > operations. (This is possible because indexes do not reference their page > > item identifiers.) > > But when a HOT update happens the entry in an (logically unchanged) > index still points to the original heap tid, and that line item is > updated with a pointer to the new line pointer in the same page. > > Assuming I'm understanding this correctly, attached is a patch > correcting the description. > > I have Reviewed the patch. Patch applies neatly without any issues. Documentation build was successful and there was noSpell-check issue also. I did not find any issues. The patch looks >good to me. > >Thanks and Regards, >Shubham Khanna.
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
Robert Haas
Date:
On Wed, Oct 4, 2023 at 9:12 PM James Coleman <jtc331@gmail.com> wrote: > All right, attached is a v3 which attempts to fix the wrong > information with an economy of words. I may at some point submit a > separate patch that adds a broader pruning section, but this at least > brings the docs inline with reality insofar as they address it. I don't think this is as good as what I proposed back on October 2nd. IMHO, that version does a good job making the text accurate and clear, and is directly responsive to your original complaint, namely, that the root of the HOT chain can't be removed. But this version seems to contain a number of incidental changes that are unrelated to that point, e.g. "old versions" -> "old, no longer visible versions", "can be completely removed" -> "may be pruned", and the removal of the sentence "In summary, heap-only tuple updates can only be created - if columns used by indexes are not updated" which AFAICT is both completely correct as-is and unrelated to the original complaint. Maybe I shouldn't be, but I'm slightly frustrated here. I thought I had proposed an alternative which you found acceptable, but then you proposed several more versions that did other things instead, and I never really understood why we couldn't just adopt the version that you seemed to think was OK. If there's a problem with that, say what it is. If there's not, let's do that and move on. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
James Coleman
Date:
On Thu, Mar 14, 2024 at 10:28 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Oct 4, 2023 at 9:12 PM James Coleman <jtc331@gmail.com> wrote: > > All right, attached is a v3 which attempts to fix the wrong > > information with an economy of words. I may at some point submit a > > separate patch that adds a broader pruning section, but this at least > > brings the docs inline with reality insofar as they address it. > > I don't think this is as good as what I proposed back on October 2nd. > IMHO, that version does a good job making the text accurate and clear, > and is directly responsive to your original complaint, namely, that > the root of the HOT chain can't be removed. But this version seems to > contain a number of incidental changes that are unrelated to that > point, e.g. "old versions" -> "old, no longer visible versions", "can > be completely removed" -> "may be pruned", and the removal of the > sentence "In summary, heap-only tuple updates can only be created - if > columns used by indexes are not updated" which AFAICT is both > completely correct as-is and unrelated to the original complaint. > > Maybe I shouldn't be, but I'm slightly frustrated here. I thought I > had proposed an alternative which you found acceptable, but then you > proposed several more versions that did other things instead, and I > never really understood why we couldn't just adopt the version that > you seemed to think was OK. If there's a problem with that, say what > it is. If there's not, let's do that and move on. I think there's simply a misunderstanding here. I read your proposal as "here's an idea to consider as you work on the patch" (as happens on many other threads), and so I attempted to incorporate your primary points of feedback into my next version of the patch. Obviously I have reasons for the other changes I made: for example, "no longer visible" improves the correctness, since being an old version isn't sufficient. I removed the "In summary" sentence because it simply doesn't follow from the prose before it. That sentence simply restates information already appearing earlier in almost as simple a form, so it's redundant. But more importantly it's just not actually a summary of the text before it, so removing it improves the documentation. I can explain my reasoning further if desired, but I fear it would simply frustrate you further, so I'll stop here. If the goal here is the most minimal patch possible, then please commit what you proposed. I am interested in improving the document further, but I don't know how to do that easily if the requirement is effectively "must only change one specific detail at a time". So, that leaves me feeling a bit frustrated also. Regards, James Coleman
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
Robert Haas
Date:
On Thu, Mar 14, 2024 at 9:07 PM James Coleman <jtc331@gmail.com> wrote: > Obviously I have reasons for the other changes I made: for example, > "no longer visible" improves the correctness, since being an old > version isn't sufficient. I removed the "In summary" sentence because > it simply doesn't follow from the prose before it. That sentence > simply restates information already appearing earlier in almost as > simple a form, so it's redundant. But more importantly it's just not > actually a summary of the text before it, so removing it improves the > documentation. > > I can explain my reasoning further if desired, but I fear it would > simply frustrate you further, so I'll stop here. > > If the goal here is the most minimal patch possible, then please > commit what you proposed. I am interested in improving the document > further, but I don't know how to do that easily if the requirement is > effectively "must only change one specific detail at a time". So, that > leaves me feeling a bit frustrated also. I don't think the goal is to have the most minimal patch possible, but at the same time, I'm responsible for what I commit. If I commit a patch that changes something and somebody, especially another committer, writes an email and says "hey, why the heck did you change this, you dummy?" then I need to be able to answer that question, and saying "uh, well, James Coleman had it in his patch and he didn't explain why it was there and I didn't know either but I just kind of committed it anyway" is not where I want to be. Almost without exception, it's not the patch author who gets asked why they included a certain thing in the patch; it's the committer who gets asked why it got committed that way -- and at least as I see it, if there's not a good answer to that question, it's the committer who gets judged, not the patch author. In some cases, such questions and judgements arrive without warning YEARS after the commit. So, to protect myself against questions from other hackers that end, at least implicitly, in "you dummy," I try to make sure there's an adequate paper trail for everything I commit. Between the commit message, the comments, and the email discussion, it needs to be crystal clear why something was thought to be a good idea at the time. Hopefully, it will be clear enough that I never even get a question, because the reader will be able to understand ON THEIR OWN why something was done and either go "oh, that make sense" or "well, I get why they did that, but I disagree because $REASON" ... but if that doesn't quite work out, then I hope that at the very least the paper trail will be good enough that I can reconstruct the reasoning if needed. For a recent example of a case where I clearly failed do a good enough job to keep someone from asking a "you dummy" question, see the http://postgr.es/m/6030bdec-0de1-4da2-b0b3-335007eae97f@iki.fi and in particular the paragraph that starts with "However". Therefore, if I realize when reading the patch that it contains odd-looking changes which I can't relate to the stated goal, it's getting bounced, pretty much 100% of the time. If it's a small patch and I really care about it and it's a new submitter who doesn't know any better, I might choose to remove those changes myself and commit the rest; and if I like those changes for other reasons, I might choose to commit them separately. In any other situation, I'm just going to tell the submitter that they need to take out the unrelated changes and move on to the next email thread. In this particular situation, I see that this whole discussion is slightly moot by virtue of 7a9328e8e40534fb4de41b4ac152e3c6989d84e7 (Jeff Davis, 2024-03-05) which removed the "In summary, heap-only tuple updates can only be created if columns used by indexes are not updated" sentence that you also removed. But there is an important difference between that patch and yours, at least IMHO. You argue that the sentence in question didn't flow from what precedes it, but I don't agree with that as a rationale; I think that sentence flows perfectly well from what precedes it and is a generally adequate summary of the point of the section. That we disagree on the merits of that sentence is fine; there's no hope that we're all going to agree on everything. What makes me frustrated is that you didn't provide that rationale, which greatly complicates the whole discussion, because now I have to spend time getting you to either take it out or provide your justification before we can even reach the point of arguing about the substance. At least in my judgment, the patch Jeff committed doesn't have that problem, because the rationale that he gives in the commit message (partly by reference to another commit) is that heap-only tuples now can, in circumstances, be created even if columns used by indexes ARE updated. Based on that justification, it seems really clear to me that it was right to do SOMETHING to the sentence in question. Whether removing the sentence was better than some other alternative is debatable, and I might well have done something different, but between reading the commit message and reading the diff, it's 100% understandable to me what his reasoning was for every single byte that is in that diff. So if I love what he committed, cool! And if I hate what he committed, I'm well-placed to argue, because I know what I'm arguing against. Also, if I do feel like arguing, the fact that I know his reasoning will also make it a heck of a lot easier to come up with a proposal that meets my goals without undermining his. I can see, for example, that I can't simply propose to put that sentence back the way that it was; based on his reason for removing it, he obviously isn't going to like that. But if, say, my goal were to have some kind of a summary sentence there because I just thought that was really helpful, I could write a patch to insert a new summary sentence there and then say "hey, Jeff, you took this summary sentence out because it's not accurate, but here's a new one which I think is accurate and I'd like to add it back." Maybe he'd like that, and maybe he wouldn't, but it would have a shot, because it would take his reasoning into account, which I can do, because I know what it was. I want people who read the patches that I commit to have the same opportunities. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
Robert Haas
Date:
On Thu, Mar 14, 2024 at 9:07 PM James Coleman <jtc331@gmail.com> wrote: > If the goal here is the most minimal patch possible, then please > commit what you proposed. I am interested in improving the document > further, but I don't know how to do that easily if the requirement is > effectively "must only change one specific detail at a time". So, yesterday I wrote a long email on how I saw the goals here. Despite our disagreements, I believe we agree that the text I proposed is better than what's there, so I've committed that change now. I've also marked the CF entry as committed. Please propose the other changes you want separately. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
James Coleman
Date:
On Wed, Mar 20, 2024 at 2:15 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 14, 2024 at 9:07 PM James Coleman <jtc331@gmail.com> wrote: > > Obviously I have reasons for the other changes I made: for example, > > "no longer visible" improves the correctness, since being an old > > version isn't sufficient. I removed the "In summary" sentence because > > it simply doesn't follow from the prose before it. That sentence > > simply restates information already appearing earlier in almost as > > simple a form, so it's redundant. But more importantly it's just not > > actually a summary of the text before it, so removing it improves the > > documentation. > > > > I can explain my reasoning further if desired, but I fear it would > > simply frustrate you further, so I'll stop here. > > > > If the goal here is the most minimal patch possible, then please > > commit what you proposed. I am interested in improving the document > > further, but I don't know how to do that easily if the requirement is > > effectively "must only change one specific detail at a time". So, that > > leaves me feeling a bit frustrated also. > > I don't think the goal is to have the most minimal patch possible, but > at the same time, I'm responsible for what I commit. If I commit a > patch that changes something and somebody, especially another > committer, writes an email and says "hey, why the heck did you change > this, you dummy?" then I need to be able to answer that question, and > saying "uh, well, James Coleman had it in his patch and he didn't > explain why it was there and I didn't know either but I just kind of > committed it anyway" is not where I want to be. Almost without > exception, it's not the patch author who gets asked why they included > a certain thing in the patch; it's the committer who gets asked why it > got committed that way -- and at least as I see it, if there's not a > good answer to that question, it's the committer who gets judged, not > the patch author. In some cases, such questions and judgements arrive > without warning YEARS after the commit. > > So, to protect myself against questions from other hackers that end, > at least implicitly, in "you dummy," I try to make sure there's an > adequate paper trail for everything I commit. Between the commit > message, the comments, and the email discussion, it needs to be > crystal clear why something was thought to be a good idea at the time. > Hopefully, it will be clear enough that I never even get a question, > because the reader will be able to understand ON THEIR OWN why > something was done and either go "oh, that make sense" or "well, I get > why they did that, but I disagree because $REASON" ... but if that > doesn't quite work out, then I hope that at the very least the paper > trail will be good enough that I can reconstruct the reasoning if > needed. For a recent example of a case where I clearly failed do a > good enough job to keep someone from asking a "you dummy" question, > see the http://postgr.es/m/6030bdec-0de1-4da2-b0b3-335007eae97f@iki.fi > and in particular the paragraph that starts with "However". > > Therefore, if I realize when reading the patch that it contains > odd-looking changes which I can't relate to the stated goal, it's > getting bounced, pretty much 100% of the time. If it's a small patch > and I really care about it and it's a new submitter who doesn't know > any better, I might choose to remove those changes myself and commit > the rest; and if I like those changes for other reasons, I might > choose to commit them separately. In any other situation, I'm just > going to tell the submitter that they need to take out the unrelated > changes and move on to the next email thread. > > In this particular situation, I see that this whole discussion is > slightly moot by virtue of 7a9328e8e40534fb4de41b4ac152e3c6989d84e7 > (Jeff Davis, 2024-03-05) which removed the "In summary, heap-only > tuple updates can only be created if columns used by indexes are not > updated" sentence that you also removed. But there is an important > difference between that patch and yours, at least IMHO. You argue that > the sentence in question didn't flow from what precedes it, but I > don't agree with that as a rationale; I think that sentence flows > perfectly well from what precedes it and is a generally adequate > summary of the point of the section. That we disagree on the merits of > that sentence is fine; there's no hope that we're all going to agree > on everything. What makes me frustrated is that you didn't provide > that rationale, which greatly complicates the whole discussion, > because now I have to spend time getting you to either take it out or > provide your justification before we can even reach the point of > arguing about the substance. > > At least in my judgment, the patch Jeff committed doesn't have that > problem, because the rationale that he gives in the commit message > (partly by reference to another commit) is that heap-only tuples now > can, in circumstances, be created even if columns used by indexes ARE > updated. Based on that justification, it seems really clear to me that > it was right to do SOMETHING to the sentence in question. Whether > removing the sentence was better than some other alternative is > debatable, and I might well have done something different, but between > reading the commit message and reading the diff, it's 100% > understandable to me what his reasoning was for every single byte that > is in that diff. So if I love what he committed, cool! And if I hate > what he committed, I'm well-placed to argue, because I know what I'm > arguing against. > > Also, if I do feel like arguing, the fact that I know his reasoning > will also make it a heck of a lot easier to come up with a proposal > that meets my goals without undermining his. I can see, for example, > that I can't simply propose to put that sentence back the way that it > was; based on his reason for removing it, he obviously isn't going to > like that. But if, say, my goal were to have some kind of a summary > sentence there because I just thought that was really helpful, I could > write a patch to insert a new summary sentence there and then say > "hey, Jeff, you took this summary sentence out because it's not > accurate, but here's a new one which I think is accurate and I'd like > to add it back." Maybe he'd like that, and maybe he wouldn't, but it > would have a shot, because it would take his reasoning into account, > which I can do, because I know what it was. I want people who read the > patches that I commit to have the same opportunities. I'm happy to provide the rationale; my apologies for not providing it originally. I didn't realize that was the root of the frustration: I'd read your feedback ("[my proposed change[ is directly responsive to your original complaint...[your] version seems to contain a number of incidental changes that are unrelated to that") as being aimed at something like "a patch can only address the original specific complaint". I'll try to be more specific in my reasoning when e.g. I think a piece of documentation reads better with a few additional changes. If I forget to do that I'd also appreciate your asking up front for the rationale so that I have a chance to clarify that without misunderstanding. Regards, James Coleman
Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
From
James Coleman
Date:
On Thu, Mar 21, 2024 at 1:09 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 14, 2024 at 9:07 PM James Coleman <jtc331@gmail.com> wrote: > > If the goal here is the most minimal patch possible, then please > > commit what you proposed. I am interested in improving the document > > further, but I don't know how to do that easily if the requirement is > > effectively "must only change one specific detail at a time". > > So, yesterday I wrote a long email on how I saw the goals here. > Despite our disagreements, I believe we agree that the text I proposed > is better than what's there, so I've committed that change now. I've > also marked the CF entry as committed. Please propose the other > changes you want separately. Thanks for committing the fix. Regards, James Coleman