Thread: Eliminate redundant tuple visibility check in vacuum
While working on a set of patches to combine the freeze and visibility map WAL records into the prune record, I wrote the attached patches reusing the tuple visibility information collected in heap_page_prune() back in lazy_scan_prune(). heap_page_prune() collects the HTSV_Result for every tuple on a page and saves it in an array used by heap_prune_chain(). If we make that array available to lazy_scan_prune(), it can use it when collecting stats for vacuum and determining whether or not to freeze tuples. This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in the page. It also gets rid of the retry loop in lazy_scan_prune(). - Melanie
Attachment
Hi Melanie, On 8/29/23 01:49, Melanie Plageman wrote: > While working on a set of patches to combine the freeze and visibility > map WAL records into the prune record, I wrote the attached patches > reusing the tuple visibility information collected in heap_page_prune() > back in lazy_scan_prune(). > > heap_page_prune() collects the HTSV_Result for every tuple on a page > and saves it in an array used by heap_prune_chain(). If we make that > array available to lazy_scan_prune(), it can use it when collecting > stats for vacuum and determining whether or not to freeze tuples. > This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in > the page. > > It also gets rid of the retry loop in lazy_scan_prune(). How did you test this change? Could you measure any performance difference? If so could you provide your test case? -- David Geier (ServiceNow)
Hi David, Thanks for taking a look! On Tue, Aug 29, 2023 at 5:07 AM David Geier <geidav.pg@gmail.com> wrote: > > Hi Melanie, > > On 8/29/23 01:49, Melanie Plageman wrote: > > While working on a set of patches to combine the freeze and visibility > > map WAL records into the prune record, I wrote the attached patches > > reusing the tuple visibility information collected in heap_page_prune() > > back in lazy_scan_prune(). > > > > heap_page_prune() collects the HTSV_Result for every tuple on a page > > and saves it in an array used by heap_prune_chain(). If we make that > > array available to lazy_scan_prune(), it can use it when collecting > > stats for vacuum and determining whether or not to freeze tuples. > > This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in > > the page. > > > > It also gets rid of the retry loop in lazy_scan_prune(). > > How did you test this change? I didn't add a new test, but you can confirm some existing test coverage if you, for example, set every HTSV_Result in the array to HEAPTUPLE_LIVE and run the regression tests. Tests relying on vacuum removing the right tuples may fail. For example, select count(*) from gin_test_tbl where i @> array[1, 999]; in src/test/regress/sql/gin.sql fails for me since it sees a tuple it shouldn't. > Could you measure any performance difference? > > If so could you provide your test case? I created a large table and then updated a tuple on every page in the relation and vacuumed it. I saw a consistent slight improvement in vacuum execution time. I profiled a bit with perf stat as well. The difference is relatively small for this kind of example, but I can work on a more compelling, realistic example. I think eliminating the retry loop is also useful, as I have heard that users have had to cancel vacuums which were in this retry loop for too long. - Melanie
> On Tue, Aug 29, 2023 at 5:07 AM David Geier <geidav.pg@gmail.com> wrote: > > Could you measure any performance difference? > > > > If so could you provide your test case? > > I created a large table and then updated a tuple on every page in the > relation and vacuumed it. I saw a consistent slight improvement in > vacuum execution time. I profiled a bit with perf stat as well. The > difference is relatively small for this kind of example, but I can > work on a more compelling, realistic example. I think eliminating the > retry loop is also useful, as I have heard that users have had to > cancel vacuums which were in this retry loop for too long. Just to provide a specific test case, if you create a small table like this create table foo (a int, b int, c int) with(autovacuum_enabled=false); insert into foo select i, i, i from generate_series(1, 10000000); And then vacuum it. I find that with my patch applied I see a consistent ~9% speedup (averaged across multiple runs). master: ~533ms patch: ~487ms And in the profile, with my patch applied, you notice less time spent in HeapTupleSatisfiesVacuumHorizon() master: 11.83% postgres postgres [.] heap_page_prune 11.59% postgres postgres [.] heap_prepare_freeze_tuple 8.77% postgres postgres [.] lazy_scan_prune 8.01% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon 7.79% postgres postgres [.] heap_tuple_should_freeze patch: 13.41% postgres postgres [.] heap_prepare_freeze_tuple 9.88% postgres postgres [.] heap_page_prune 8.61% postgres postgres [.] lazy_scan_prune 7.00% postgres postgres [.] heap_tuple_should_freeze 6.43% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon - Melanie
Hi Melanie, On 8/31/23 02:59, Melanie Plageman wrote: >> I created a large table and then updated a tuple on every page in the >> relation and vacuumed it. I saw a consistent slight improvement in >> vacuum execution time. I profiled a bit with perf stat as well. The >> difference is relatively small for this kind of example, but I can >> work on a more compelling, realistic example. I think eliminating the >> retry loop is also useful, as I have heard that users have had to >> cancel vacuums which were in this retry loop for too long. > Just to provide a specific test case, if you create a small table like this > > create table foo (a int, b int, c int) with(autovacuum_enabled=false); > insert into foo select i, i, i from generate_series(1, 10000000); > > And then vacuum it. I find that with my patch applied I see a > consistent ~9% speedup (averaged across multiple runs). > > master: ~533ms > patch: ~487ms > > And in the profile, with my patch applied, you notice less time spent > in HeapTupleSatisfiesVacuumHorizon() > > master: > 11.83% postgres postgres [.] heap_page_prune > 11.59% postgres postgres [.] heap_prepare_freeze_tuple > 8.77% postgres postgres [.] lazy_scan_prune > 8.01% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon > 7.79% postgres postgres [.] heap_tuple_should_freeze > > patch: > 13.41% postgres postgres [.] heap_prepare_freeze_tuple > 9.88% postgres postgres [.] heap_page_prune > 8.61% postgres postgres [.] lazy_scan_prune > 7.00% postgres postgres [.] heap_tuple_should_freeze > 6.43% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon Thanks a lot for providing additional information and the test case. I tried it on a release build and I also see a 10% speed-up. I reset the visibility map between VACUUM runs, see: CREATE EXTENSION pg_visibility; CREATE TABLE foo (a INT, b INT, c INT) WITH(autovacuum_enabled=FALSE); INSERT INTO foo SELECT i, i, i from generate_series(1, 10000000) i; VACUUM foo; SELECT pg_truncate_visibility_map('foo'); VACUUM foo; SELECT pg_truncate_visibility_map('foo'); VACUUM foo; ... The first patch, which refactors the code so we can pass the result of the visibility checks to the caller, looks good to me. Regarding the 2nd patch (disclaimer: I'm not too familiar with that area of the code): I don't completely understand why the retry loop is not needed anymore and how you now detect/handle the possible race condition? It can still happen that an aborting transaction changes the state of a row after heap_page_prune() looked at that row. Would that case now not be ignored? -- David Geier (ServiceNow)
On Mon, Aug 28, 2023 at 7:49 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > While working on a set of patches to combine the freeze and visibility > map WAL records into the prune record, I wrote the attached patches > reusing the tuple visibility information collected in heap_page_prune() > back in lazy_scan_prune(). > > heap_page_prune() collects the HTSV_Result for every tuple on a page > and saves it in an array used by heap_prune_chain(). If we make that > array available to lazy_scan_prune(), it can use it when collecting > stats for vacuum and determining whether or not to freeze tuples. > This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in > the page. > > It also gets rid of the retry loop in lazy_scan_prune(). In general, I like these patches. I think it's a clever approach, and I don't really see any down side. It should just be straight-up better than what we have now, and if it's not better, it still shouldn't be any worse. I have a few suggestions: - Rather than removing the rather large comment block at the top of lazy_scan_prune() I'd like to see it rewritten to explain how we now deal with the problem. I'd suggest leaving the first paragraph ("Prior to...") just as it is and replace all the words following "The approach we take now is" with a description of the approach that this patch takes to the problem. - I'm not a huge fan of the caller of heap_page_prune() having to know how to initialize the PruneResult. Can heap_page_prune() itself do that work, so that presult is an out parameter rather than an in-out parameter? Or, if not, can it be moved to a new helper function, like heap_page_prune_init(), rather than having that logic in 2+ places? - int ndeleted, - nnewlpdead; - - ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin, - limited_ts, &nnewlpdead, NULL); + int ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin, + limited_ts, &presult, NULL); - I don't particularly like merging the declaration with the assignment unless the call is narrow enough that we don't need a line break in there, which is not the case here. - I haven't thoroughly investigated yet, but I wonder if there are any other places where comments need updating. As a general note, I find it desirable for a function's header comment to mention whether any pointer parameters are in parameters, out parameters, or in-out parameters, and what the contract between caller and callee actually is. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Aug 31, 2023 at 2:03 PM Robert Haas <robertmhaas@gmail.com> wrote: > > I have a few suggestions: > > - Rather than removing the rather large comment block at the top of > lazy_scan_prune() I'd like to see it rewritten to explain how we now > deal with the problem. I'd suggest leaving the first paragraph ("Prior > to...") just as it is and replace all the words following "The > approach we take now is" with a description of the approach that this > patch takes to the problem. Good idea. I've updated the comment. I also explain why this new approach works in the commit message and reference the commit which added the previous approach. > - I'm not a huge fan of the caller of heap_page_prune() having to know > how to initialize the PruneResult. Can heap_page_prune() itself do > that work, so that presult is an out parameter rather than an in-out > parameter? Or, if not, can it be moved to a new helper function, like > heap_page_prune_init(), rather than having that logic in 2+ places? Ah, yes. Now that it has two callers, and since it is exclusively an output parameter, it is quite ugly to initialize it in both callers. Fixed in the attached. > - int ndeleted, > - nnewlpdead; > - > - ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin, > - limited_ts, &nnewlpdead, NULL); > + int ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin, > + limited_ts, &presult, NULL); > > - I don't particularly like merging the declaration with the > assignment unless the call is narrow enough that we don't need a line > break in there, which is not the case here. I have changed this. > - I haven't thoroughly investigated yet, but I wonder if there are any > other places where comments need updating. As a general note, I find > it desirable for a function's header comment to mention whether any > pointer parameters are in parameters, out parameters, or in-out > parameters, and what the contract between caller and callee actually > is. I've investigated vacuumlazy.c and pruneheap.c and looked at the commit that added the retry loop (8523492d4e349) to see everywhere it added comments and don't see anywhere else that needs updating. I have updated lazy_scan_prune()'s function header comment to describe the nature of the in-out and output parameters and the contract. - Melanie
Attachment
On Thu, Aug 31, 2023 at 5:39 AM David Geier <geidav.pg@gmail.com> wrote: > Regarding the 2nd patch (disclaimer: I'm not too familiar with that area > of the code): I don't completely understand why the retry loop is not > needed anymore and how you now detect/handle the possible race > condition? It can still happen that an aborting transaction changes the > state of a row after heap_page_prune() looked at that row. Would that > case now not be ignored? Thanks for asking. I've updated the comment in the code and the commit message about this, as it seems important to be clear. Any inserting transaction which aborts after heap_page_prune()'s visibility check will now be of no concern to lazy_scan_prune(). Since we don't do the visibility check again, we won't find the tuple HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to the array of tuples for vacuum to reap. This does mean that we won't reap and remove tuples whose inserting transactions abort right after heap_page_prune()'s visibility check. But, this doesn't seem like an issue. They may not be removed until the next vacuum, but ISTM it is actually worse to pay the cost of re-pruning the whole page just to clean up that one tuple. Maybe if that renders the page all visible and we can mark it as such in the visibility map -- but that seems like a relatively narrow use case. - Melanie
On Thu, Aug 31, 2023 at 3:35 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > Any inserting transaction which aborts after heap_page_prune()'s > visibility check will now be of no concern to lazy_scan_prune(). Since > we don't do the visibility check again, we won't find the tuple > HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to > the array of tuples for vacuum to reap. This does mean that we won't > reap and remove tuples whose inserting transactions abort right after > heap_page_prune()'s visibility check. But, this doesn't seem like an > issue. It's definitely not an issue. The loop is essentially a hacky way of getting a consistent picture of which tuples should be treated as HEAPTUPLE_DEAD, and which tuples need to be left behind (consistent at the level of each page and each HOT chain, at least). Making that explicit seems strictly better. > They may not be removed until the next vacuum, but ISTM it is > actually worse to pay the cost of re-pruning the whole page just to > clean up that one tuple. Maybe if that renders the page all visible > and we can mark it as such in the visibility map -- but that seems > like a relatively narrow use case. The chances of actually hitting the retry are microscopic anyway. It has nothing to do with making sure that dead tuples from aborted tuples get removed for its own sake, or anything. Rather, the retry is all about making sure that all TIDs that get removed from indexes can only point to LP_DEAD stubs. Prior to Postgres 14, HEAPTUPLE_DEAD tuples with storage would very occasionally be left behind, which made life difficult in a bunch of other places -- for no good reason. -- Peter Geoghegan
Hi, On 9/1/23 03:25, Peter Geoghegan wrote: > On Thu, Aug 31, 2023 at 3:35 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: >> Any inserting transaction which aborts after heap_page_prune()'s >> visibility check will now be of no concern to lazy_scan_prune(). Since >> we don't do the visibility check again, we won't find the tuple >> HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to >> the array of tuples for vacuum to reap. This does mean that we won't >> reap and remove tuples whose inserting transactions abort right after >> heap_page_prune()'s visibility check. But, this doesn't seem like an >> issue. > It's definitely not an issue. > > The loop is essentially a hacky way of getting a consistent picture of > which tuples should be treated as HEAPTUPLE_DEAD, and which tuples > need to be left behind (consistent at the level of each page and each > HOT chain, at least). Making that explicit seems strictly better. > >> They may not be removed until the next vacuum, but ISTM it is >> actually worse to pay the cost of re-pruning the whole page just to >> clean up that one tuple. Maybe if that renders the page all visible >> and we can mark it as such in the visibility map -- but that seems >> like a relatively narrow use case. > The chances of actually hitting the retry are microscopic anyway. It > has nothing to do with making sure that dead tuples from aborted > tuples get removed for its own sake, or anything. Rather, the retry is > all about making sure that all TIDs that get removed from indexes can > only point to LP_DEAD stubs. Prior to Postgres 14, HEAPTUPLE_DEAD > tuples with storage would very occasionally be left behind, which made > life difficult in a bunch of other places -- for no good reason. > That makes sense and seems like a much better compromise. Thanks for the explanations. Please update the comment to document the corner case and how we handle it. -- David Geier (ServiceNow)
On Thu, Aug 31, 2023 at 6:29 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > I have changed this. I spent a bunch of time today looking at this, thinking maybe I could commit it. But then I got cold feet. With these patches applied, PruneResult ends up being declared in heapam.h, with a comment that says /* State returned from pruning */. But that comment isn't accurate. The two new members that get added to the structure by 0002, namely nnewlpdead and htsv, are in fact state that is returned from pruning. But the other 5 members aren't. They're just initialized to constant values by pruning and then filled in for real by the vacuum logic. That's extremely weird. It would be fine if heap_page_prune() just grew a new output argument that only returned the HTSV results, or perhaps it could make sense to bundle any existing out parameters together into a struct and then add new things to that struct instead of adding even more parameters to the function itself. But there doesn't seem to be any good reason to muddle together the new output parameters for heap_page_prune() with a bunch of state that is currently internal to vacuumlazy.c. I realize that the shape of the patches probably stems from the fact that they started out life as part of a bigger patch set. But to be committed independently, they need to be shaped in a way that makes sense independently, and I don't think this qualifies. On the plus side, it seems to me that it's probably not that much work to fix this issue and that the result would likely be a smaller patch than what you have now, which is something. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Sep 6, 2023 at 1:04 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Aug 31, 2023 at 6:29 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > I have changed this. > > I spent a bunch of time today looking at this, thinking maybe I could > commit it. But then I got cold feet. > > With these patches applied, PruneResult ends up being declared in > heapam.h, with a comment that says /* State returned from pruning */. > But that comment isn't accurate. The two new members that get added to > the structure by 0002, namely nnewlpdead and htsv, are in fact state > that is returned from pruning. But the other 5 members aren't. They're > just initialized to constant values by pruning and then filled in for > real by the vacuum logic. That's extremely weird. It would be fine if > heap_page_prune() just grew a new output argument that only returned > the HTSV results, or perhaps it could make sense to bundle any > existing out parameters together into a struct and then add new things > to that struct instead of adding even more parameters to the function > itself. But there doesn't seem to be any good reason to muddle > together the new output parameters for heap_page_prune() with a bunch > of state that is currently internal to vacuumlazy.c. > > I realize that the shape of the patches probably stems from the fact > that they started out life as part of a bigger patch set. But to be > committed independently, they need to be shaped in a way that makes > sense independently, and I don't think this qualifies. On the plus > side, it seems to me that it's probably not that much work to fix this > issue and that the result would likely be a smaller patch than what > you have now, which is something. Yeah, I think this is a fair concern. I have addressed it in the attached patches. I thought a lot about whether or not adding a PruneResult which contains only the output parameters and result of heap_page_prune() is annoying since we have so many other *Prune* data structures. I decided it's not annoying. In some cases, four outputs don't merit a new structure. In this case, I think it declutters the code a bit -- independent of any other patches I may be writing :) - Melanie
Attachment
On Wed, Sep 6, 2023 at 5:21 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > Yeah, I think this is a fair concern. I have addressed it in the > attached patches. > > I thought a lot about whether or not adding a PruneResult which > contains only the output parameters and result of heap_page_prune() is > annoying since we have so many other *Prune* data structures. I > decided it's not annoying. In some cases, four outputs don't merit a > new structure. In this case, I think it declutters the code a bit -- > independent of any other patches I may be writing :) I took a look at 0001 and I think that it's incorrect. In the existing code, *off_loc gets updated before each call to heap_prune_satisfies_vacuum(). This means that if an error occurs in heap_prune_satisfies_vacuum(), *off_loc will as of that moment contain the relevant offset number. In your version, the relevant offset number will only be stored in some local structure to which the caller doesn't yet have access. The difference is meaningful. lazy_scan_prune passes off_loc as vacrel->offnum, which means that if an error happens, vacrel->offnum will have the right value, and so when the error context callback installed by heap_vacuum_rel() fires, namely vacuum_error_callback(), it can look at vacrel->offnum and know where the error happened. With your patch, I think that would no longer work. I haven't run the regression suite with 0001 applied. I'm guessing that you did, and that they passed, which if true means that we don't have a test for this, which is a shame, although writing such a test might be a bit tricky. If there is a test case for this and you didn't run it, woops. This is also why I think it's *extremely* important for the header comment of a function that takes pointer parameters to document the semantics of those pointers. Normally they are in parameters or out parameters or in-out parameters, but here it's something even more complicated. The existing header comment says "off_loc is the offset location required by the caller to use in error callback," which I didn't really understand until I actually looked at what the code is doing, so I consider that somebody could've done a better job writing this comment, but in theory you could've also noticed that, at least AFAICS, there's no way for the function to return with *off_loc set to anything other than InvalidOffsetNumber. That means that the statements which set *off_loc to other values must have some other purpose. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Sep 7, 2023 at 1:37 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Sep 6, 2023 at 5:21 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > Yeah, I think this is a fair concern. I have addressed it in the > > attached patches. > > > > I thought a lot about whether or not adding a PruneResult which > > contains only the output parameters and result of heap_page_prune() is > > annoying since we have so many other *Prune* data structures. I > > decided it's not annoying. In some cases, four outputs don't merit a > > new structure. In this case, I think it declutters the code a bit -- > > independent of any other patches I may be writing :) > > I took a look at 0001 and I think that it's incorrect. In the existing > code, *off_loc gets updated before each call to > heap_prune_satisfies_vacuum(). This means that if an error occurs in > heap_prune_satisfies_vacuum(), *off_loc will as of that moment contain > the relevant offset number. In your version, the relevant offset > number will only be stored in some local structure to which the caller > doesn't yet have access. The difference is meaningful. lazy_scan_prune > passes off_loc as vacrel->offnum, which means that if an error > happens, vacrel->offnum will have the right value, and so when the > error context callback installed by heap_vacuum_rel() fires, namely > vacuum_error_callback(), it can look at vacrel->offnum and know where > the error happened. With your patch, I think that would no longer > work. You are right. That is a major problem. Thank you for finding that. I was able to confirm the breakage by patching in an error to heap_page_prune() after we have set off_loc and confirming that the error context has the offset in master and is missing it with my patch applied. I can fix it by changing the type of PruneResult->off_loc to be an OffsetNumber pointer. This does mean that PruneResult will be initialized partially by heap_page_prune() callers. I wonder if you think that undermines the case for making a new struct. I still want to eliminate the NULL check of off_loc in heap_page_prune() by making it a required parameter. Even though on-access pruning does not have an error callback mechanism which uses the offset, it seems better to have a pointless local variable in heap_page_prune_opt() than to do a NULL check for every tuple pruned. > I haven't run the regression suite with 0001 applied. I'm guessing > that you did, and that they passed, which if true means that we don't > have a test for this, which is a shame, although writing such a test > might be a bit tricky. If there is a test case for this and you didn't > run it, woops. There is no test coverage for the vacuum error callback context currently (tests passed for me). I looked into how we might add such a test. First, I investigated what kind of errors might occur during heap_prune_satisfies_vacuum(). Some of the multixact functions called by HeapTupleSatisfiesVacuumHorizon() could error out -- for example GetMultiXactIdMembers(). It seems difficult to trigger the errors in GetMultiXactIdMembers(), as we would have to cause wraparound. It would be even more difficult to ensure that we hit those specific errors from a call stack containing heap_prune_satisfies_vacuum(). As such, I'm not sure I can think of a way to protect future developers from repeating my mistake--apart from improving the comment like you mentioned. - Melanie
On Thu, Sep 7, 2023 at 3:10 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > I can fix it by changing the type of PruneResult->off_loc to be an > OffsetNumber pointer. This does mean that PruneResult will be > initialized partially by heap_page_prune() callers. I wonder if you > think that undermines the case for making a new struct. I think that it undermines the case for including that particular argument in the struct. It's not really a Prune*Result* if the caller initializes it in part. It seems fairly reasonable to still have a PruneResult struct for the other stuff, though, at least to me. How do you see it? (One could also argue that this is a somewhat more byzantine way of doing error reporting than would be desirable, but fixing that problem doesn't seem too straightforward so perhaps it's prudent to leave it well enough alone.) > I still want to eliminate the NULL check of off_loc in > heap_page_prune() by making it a required parameter. Even though > on-access pruning does not have an error callback mechanism which uses > the offset, it seems better to have a pointless local variable in > heap_page_prune_opt() than to do a NULL check for every tuple pruned. It doesn't seem important to me unless it improves performance. If it's just stylistic, I don't object, but I also don't really see a reason to care. > > I haven't run the regression suite with 0001 applied. I'm guessing > > that you did, and that they passed, which if true means that we don't > > have a test for this, which is a shame, although writing such a test > > might be a bit tricky. If there is a test case for this and you didn't > > run it, woops. > > There is no test coverage for the vacuum error callback context > currently (tests passed for me). I looked into how we might add such a > test. First, I investigated what kind of errors might occur during > heap_prune_satisfies_vacuum(). Some of the multixact functions called > by HeapTupleSatisfiesVacuumHorizon() could error out -- for example > GetMultiXactIdMembers(). It seems difficult to trigger the errors in > GetMultiXactIdMembers(), as we would have to cause wraparound. It > would be even more difficult to ensure that we hit those specific > errors from a call stack containing heap_prune_satisfies_vacuum(). As > such, I'm not sure I can think of a way to protect future developers > from repeating my mistake--apart from improving the comment like you > mentioned. 004_verify_heapam.pl has some tests that intentionally corrupt pages and then use pg_amcheck to detect the corruption. Such an approach could probably also be used here. But it's a pain to get such tests right, because any change to the page format due to endianness, different block size, or whatever can make carefully-written tests go boom. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Sep 7, 2023 at 3:30 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Sep 7, 2023 at 3:10 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > I can fix it by changing the type of PruneResult->off_loc to be an > > OffsetNumber pointer. This does mean that PruneResult will be > > initialized partially by heap_page_prune() callers. I wonder if you > > think that undermines the case for making a new struct. > > I think that it undermines the case for including that particular > argument in the struct. It's not really a Prune*Result* if the caller > initializes it in part. It seems fairly reasonable to still have a > PruneResult struct for the other stuff, though, at least to me. How do > you see it? Yes, I think off_loc probably didn't belong in PruneResult to begin with. It is inherently not a result of pruning since it will only be used in the event that pruning doesn't complete (it errors out). In the attached v4 patch set, I have both PruneResult and off_loc as parameters to heap_page_prune(). I have also added a separate commit which adds comments both above heap_page_prune()'s call site in lazy_scan_prune() and in the heap_page_prune() function header clarifying the various points we discussed. > > I still want to eliminate the NULL check of off_loc in > > heap_page_prune() by making it a required parameter. Even though > > on-access pruning does not have an error callback mechanism which uses > > the offset, it seems better to have a pointless local variable in > > heap_page_prune_opt() than to do a NULL check for every tuple pruned. > > It doesn't seem important to me unless it improves performance. If > it's just stylistic, I don't object, but I also don't really see a > reason to care. I did some performance testing but, as expected, I couldn't concoct a scenario where the overhead was noticeable in a profile. So much else is happening in that code, the NULL check basically doesn't matter (even though it isn't optimized out). I mostly wanted to remove the NULL checks because I found them distracting (so, a stylistic complaint). However, upon further reflection, I actually think it is better if heap_page_prune_opt() passes NULL. heap_page_prune() has no error callback mechanism that could use it, and passing a valid value implies otherwise. Also, as you said, off_loc will always be InvalidOffsetNumber when heap_page_prune() returns normally, so having heap_page_prune_opt() pass a dummy value might actually be more confusing for future programmers. > > > I haven't run the regression suite with 0001 applied. I'm guessing > > > that you did, and that they passed, which if true means that we don't > > > have a test for this, which is a shame, although writing such a test > > > might be a bit tricky. If there is a test case for this and you didn't > > > run it, woops. > > > > There is no test coverage for the vacuum error callback context > > currently (tests passed for me). I looked into how we might add such a > > test. First, I investigated what kind of errors might occur during > > heap_prune_satisfies_vacuum(). Some of the multixact functions called > > by HeapTupleSatisfiesVacuumHorizon() could error out -- for example > > GetMultiXactIdMembers(). It seems difficult to trigger the errors in > > GetMultiXactIdMembers(), as we would have to cause wraparound. It > > would be even more difficult to ensure that we hit those specific > > errors from a call stack containing heap_prune_satisfies_vacuum(). As > > such, I'm not sure I can think of a way to protect future developers > > from repeating my mistake--apart from improving the comment like you > > mentioned. > > 004_verify_heapam.pl has some tests that intentionally corrupt pages > and then use pg_amcheck to detect the corruption. Such an approach > could probably also be used here. But it's a pain to get such tests > right, because any change to the page format due to endianness, > different block size, or whatever can make carefully-written tests go > boom. Cool! I hadn't examined how these tests work until now. I took a stab at writing a test in the existing 0004_verify_heapam.pl. The simplest thing would be if we could just vacuum the corrupted table ("test") after running pg_amcheck and compare the error context to our expectation. I found that this didn't work, though. In an assert build, vacuum trips an assert before it hits an error while vacuuming a corrupted tuple in the "test" table. There might be a way of modifying the existing test code to avoid this, but I tried the next easiest thing -- corrupting a tuple in the other existing table in the file, "junk". This is possible to do, but we have to repeat a lot of the setup code done for the "test" table to get the line pointer array and loop through and corrupt a tuple. In order to do this well, I would want to refactor some of the boilerplate into a function. There are other fiddly bits as well that I needed to consider. I think a test like this could be useful coverage of the some of the possible errors that could happen in heap_prune_satisfies_vacuum(), but it definitely isn't coverage of pg_amcheck (and thus shouldn't go in that file) and a whole new test which spins up a Postgres to cover vacuum_error_callback() seemed like a bit much. - Melanie
Attachment
On Thu, Sep 7, 2023 at 6:23 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > I mostly wanted to remove the NULL checks because I found them > distracting (so, a stylistic complaint). However, upon further > reflection, I actually think it is better if heap_page_prune_opt() > passes NULL. heap_page_prune() has no error callback mechanism that > could use it, and passing a valid value implies otherwise. Also, as > you said, off_loc will always be InvalidOffsetNumber when > heap_page_prune() returns normally, so having heap_page_prune_opt() > pass a dummy value might actually be more confusing for future > programmers. I'll look at the new patches more next week, but I wanted to comment on this point. I think this is kind of six of one, half a dozen of the other. It's not that hard to spot a variable that's only used in a function call and never initialized beforehand or used afterward, and if someone really feels the need to hammer home the point, they could always name it dummy or dummy_loc or whatever. So this point doesn't really carry a lot of weight with me. I actually think that the proposed change is probably better, but it seems like such a minor improvement that I get slightly reluctant to make it, only because churning the source code for arguable points sometimes annoys other developers. But I also had the thought that maybe it wouldn't be such a terrible idea if heap_page_prune_opt() actually used off_loc for some error reporting goodness. I mean, if HOT pruning fails, and we don't get the detail as to which tuple caused the failure, we can always run VACUUM and it will give us that information, assuming of course that the same failure happens again. But is there any reason why HOT pruning shouldn't include that error detail? If it did, then off_loc would be passed by all callers, at which point we surely would want to get rid of the branches. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Sep 8, 2023 at 11:06 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Sep 7, 2023 at 6:23 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > I mostly wanted to remove the NULL checks because I found them > > distracting (so, a stylistic complaint). However, upon further > > reflection, I actually think it is better if heap_page_prune_opt() > > passes NULL. heap_page_prune() has no error callback mechanism that > > could use it, and passing a valid value implies otherwise. Also, as > > you said, off_loc will always be InvalidOffsetNumber when > > heap_page_prune() returns normally, so having heap_page_prune_opt() > > pass a dummy value might actually be more confusing for future > > programmers. > > I'll look at the new patches more next week, but I wanted to comment > on this point. I think this is kind of six of one, half a dozen of the > other. It's not that hard to spot a variable that's only used in a > function call and never initialized beforehand or used afterward, and > if someone really feels the need to hammer home the point, they could > always name it dummy or dummy_loc or whatever. So this point doesn't > really carry a lot of weight with me. I actually think that the > proposed change is probably better, but it seems like such a minor > improvement that I get slightly reluctant to make it, only because > churning the source code for arguable points sometimes annoys other > developers. > > But I also had the thought that maybe it wouldn't be such a terrible > idea if heap_page_prune_opt() actually used off_loc for some error > reporting goodness. I mean, if HOT pruning fails, and we don't get the > detail as to which tuple caused the failure, we can always run VACUUM > and it will give us that information, assuming of course that the same > failure happens again. But is there any reason why HOT pruning > shouldn't include that error detail? If it did, then off_loc would be > passed by all callers, at which point we surely would want to get rid > of the branches. This is a good idea. I will work on a separate patch set to add an error context callback for on-access HOT pruning. - Melanie
Hi, On 2023-09-07 18:23:22 -0400, Melanie Plageman wrote: > From e986940e546171d1f1d06f62a101d695a8481e7a Mon Sep 17 00:00:00 2001 > From: Melanie Plageman <melanieplageman@gmail.com> > Date: Wed, 6 Sep 2023 14:57:20 -0400 > Subject: [PATCH v4 2/3] Move heap_page_prune output parameters into struct > > Add PruneResult, a structure containing the output parameters and result > of heap_page_prune(). Reorganizing the results of heap_page_prune() into > a struct simplifies the function signature and provides a location for > future commits to store additional output parameters. > > Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com > --- > src/backend/access/heap/pruneheap.c | 33 +++++++++++++--------------- > src/backend/access/heap/vacuumlazy.c | 17 +++++--------- > src/include/access/heapam.h | 13 +++++++++-- > src/tools/pgindent/typedefs.list | 1 + > 4 files changed, 33 insertions(+), 31 deletions(-) > > diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c > index 392b54f093..5ac286e152 100644 > --- a/src/backend/access/heap/pruneheap.c > +++ b/src/backend/access/heap/pruneheap.c > @@ -155,15 +155,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer) > */ > if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree) > { > - int ndeleted, > - nnewlpdead; > + PruneResult presult; > > - ndeleted = heap_page_prune(relation, buffer, vistest, > - &nnewlpdead, NULL); > + heap_page_prune(relation, buffer, vistest, &presult, NULL); > > /* > * Report the number of tuples reclaimed to pgstats. This is > - * ndeleted minus the number of newly-LP_DEAD-set items. > + * presult.ndeleted minus the number of newly-LP_DEAD-set items. > * > * We derive the number of dead tuples like this to avoid totally > * forgetting about items that were set to LP_DEAD, since they > @@ -175,9 +173,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer) > * tracks ndeleted, since it will set the same LP_DEAD items to > * LP_UNUSED separately. > */ > - if (ndeleted > nnewlpdead) > + if (presult.ndeleted > presult.nnewlpdead) > pgstat_update_heap_dead_tuples(relation, > - ndeleted - nnewlpdead); > + presult.ndeleted - presult.nnewlpdead); > } > > /* And release buffer lock */ > @@ -204,24 +202,22 @@ heap_page_prune_opt(Relation relation, Buffer buffer) > * (see heap_prune_satisfies_vacuum and > * HeapTupleSatisfiesVacuum). > * > - * Sets *nnewlpdead for caller, indicating the number of items that were > - * newly set LP_DEAD during prune operation. > + * presult contains output parameters needed by callers such as the number of > + * tuples removed and the number of line pointers newly marked LP_DEAD. > + * heap_page_prune() is responsible for initializing it. > * > * off_loc is the current offset into the line pointer array while pruning. > * This is used by vacuum to populate the error context message. On-access > * pruning has no such callback mechanism for populating the error context, so > * it passes NULL. When provided by the caller, off_loc is set exclusively by > * heap_page_prune(). > - * > - * Returns the number of tuples deleted from the page during this call. > */ > -int > +void > heap_page_prune(Relation relation, Buffer buffer, > GlobalVisState *vistest, > - int *nnewlpdead, > + PruneResult *presult, > OffsetNumber *off_loc) > { > - int ndeleted = 0; > Page page = BufferGetPage(buffer); > BlockNumber blockno = BufferGetBlockNumber(buffer); > OffsetNumber offnum, > @@ -247,6 +243,9 @@ heap_page_prune(Relation relation, Buffer buffer, > prstate.nredirected = prstate.ndead = prstate.nunused = 0; > memset(prstate.marked, 0, sizeof(prstate.marked)); > > + presult->ndeleted = 0; > + presult->nnewlpdead = 0; > + > maxoff = PageGetMaxOffsetNumber(page); > tup.t_tableOid = RelationGetRelid(prstate.rel); > > @@ -321,7 +320,7 @@ heap_page_prune(Relation relation, Buffer buffer, > continue; > > /* Process this item or chain of items */ > - ndeleted += heap_prune_chain(buffer, offnum, &prstate); > + presult->ndeleted += heap_prune_chain(buffer, offnum, &prstate); > } > > /* Clear the offset information once we have processed the given page. */ > @@ -422,9 +421,7 @@ heap_page_prune(Relation relation, Buffer buffer, > END_CRIT_SECTION(); > > /* Record number of newly-set-LP_DEAD items for caller */ > - *nnewlpdead = prstate.ndead; > - > - return ndeleted; > + presult->nnewlpdead = prstate.ndead; > } > > > diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c > index 102cc97358..7ead9cfe9d 100644 > --- a/src/backend/access/heap/vacuumlazy.c > +++ b/src/backend/access/heap/vacuumlazy.c > @@ -1544,12 +1544,11 @@ lazy_scan_prune(LVRelState *vacrel, > ItemId itemid; > HeapTupleData tuple; > HTSV_Result res; > - int tuples_deleted, > - tuples_frozen, > + PruneResult presult; > + int tuples_frozen, > lpdead_items, > live_tuples, > recently_dead_tuples; > - int nnewlpdead; > HeapPageFreeze pagefrz; > int64 fpi_before = pgWalUsage.wal_fpi; > OffsetNumber deadoffsets[MaxHeapTuplesPerPage]; > @@ -1572,7 +1571,6 @@ retry: > pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid; > pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid; > pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid; > - tuples_deleted = 0; > tuples_frozen = 0; > lpdead_items = 0; > live_tuples = 0; > @@ -1581,9 +1579,8 @@ retry: > /* > * Prune all HOT-update chains in this page. > * > - * We count tuples removed by the pruning step as tuples_deleted. Its > - * final value can be thought of as the number of tuples that have been > - * deleted from the table. It should not be confused with lpdead_items; > + * We count the number of tuples removed from the page by the pruning step > + * in presult.ndeleted. It should not be confused with lpdead_items; > * lpdead_items's final value can be thought of as the number of tuples > * that were deleted from indexes. > * > @@ -1591,9 +1588,7 @@ retry: > * current offset when populating the error context message, so it is > * imperative that we pass its location to heap_page_prune. > */ > - tuples_deleted = heap_page_prune(rel, buf, vacrel->vistest, > - &nnewlpdead, > - &vacrel->offnum); > + heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum); > > /* > * Now scan the page to collect LP_DEAD items and check for tuples > @@ -1933,7 +1928,7 @@ retry: > } > > /* Finally, add page-local counts to whole-VACUUM counts */ > - vacrel->tuples_deleted += tuples_deleted; > + vacrel->tuples_deleted += presult.ndeleted; > vacrel->tuples_frozen += tuples_frozen; > vacrel->lpdead_items += lpdead_items; > vacrel->live_tuples += live_tuples; > diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h > index 6598c4d7d8..2d3f149e4f 100644 > --- a/src/include/access/heapam.h > +++ b/src/include/access/heapam.h > @@ -191,6 +191,15 @@ typedef struct HeapPageFreeze > > } HeapPageFreeze; > > +/* > + * Per-page state returned from pruning > + */ > +typedef struct PruneResult > +{ > + int ndeleted; /* Number of tuples deleted from the page */ > + int nnewlpdead; /* Number of newly LP_DEAD items */ > +} PruneResult; I think it might be worth making the names a bit less ambiguous than they were. It's a bit odd that one has "new" in the name, the other doesn't, despite both being about newly marked things. And "deleted" seems somewhat ambiguous, it could also be understood as marking things LP_DEAD. Maybe nnewunused? > static int heap_prune_chain(Buffer buffer, > OffsetNumber rootoffnum, > + int8 *htsv, > PruneState *prstate); Hm, do we really want to pass this explicitly to a bunch of functions? Seems like it might be better to either pass the PruneResult around or to have a pointer in PruneState? > /* > * The criteria for counting a tuple as live in this block need to > @@ -1682,7 +1664,7 @@ retry: > * (Cases where we bypass index vacuuming will violate this optimistic > * assumption, but the overall impact of that should be negligible.) > */ > - switch (res) > + switch ((HTSV_Result) presult.htsv[offnum]) > { > case HEAPTUPLE_LIVE: I think we should assert that we have a valid HTSV_Result here, i.e. not -1. You could wrap the cast and Assert into an inline funciton as well. Greetings, Andres Freund
[ sorry for the delay getting back to this ] On Wed, Sep 13, 2023 at 1:29 PM Andres Freund <andres@anarazel.de> wrote: > I think it might be worth making the names a bit less ambiguous than they > were. It's a bit odd that one has "new" in the name, the other doesn't, > despite both being about newly marked things. And "deleted" seems somewhat > ambiguous, it could also be understood as marking things LP_DEAD. Maybe > nnewunused? I like it the better the way Melanie did it. The current name may not be for the best, but that can be changed some other time, in a separate patch, if someone likes. For now, changing the name seems like a can of worms we don't need to open; the existing names have precedent on their side if nothing else. > > static int heap_prune_chain(Buffer buffer, > > OffsetNumber rootoffnum, > > + int8 *htsv, > > PruneState *prstate); > > Hm, do we really want to pass this explicitly to a bunch of functions? Seems > like it might be better to either pass the PruneResult around or to have a > pointer in PruneState? As far as I can see, 0002 adds it to one function (heap_page_pune) and 0003 adds it to one more (heap_prune_chain). That's not much of a bunch. > > /* > > * The criteria for counting a tuple as live in this block need to > > @@ -1682,7 +1664,7 @@ retry: > > * (Cases where we bypass index vacuuming will violate this optimistic > > * assumption, but the overall impact of that should be negligible.) > > */ > > - switch (res) > > + switch ((HTSV_Result) presult.htsv[offnum]) > > { > > case HEAPTUPLE_LIVE: > > I think we should assert that we have a valid HTSV_Result here, i.e. not > -1. You could wrap the cast and Assert into an inline funciton as well. This isn't a bad idea, although I don't find it completely necessary either. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Sep 21, 2023 at 3:53 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > static int heap_prune_chain(Buffer buffer, > > > OffsetNumber rootoffnum, > > > + int8 *htsv, > > > PruneState *prstate); > > > > Hm, do we really want to pass this explicitly to a bunch of functions? Seems > > like it might be better to either pass the PruneResult around or to have a > > pointer in PruneState? > > As far as I can see, 0002 adds it to one function (heap_page_pune) and > 0003 adds it to one more (heap_prune_chain). That's not much of a > bunch. I didn't read this carefully enough. Actually, heap_prune_chain() is the *only* function that gets int8 *htsv as an argument. I don't understand how that's a bunch ... unless there are later patches not shown here that you're worried abot. What happens in 0002 is a function getting PruneResult * as an argument, not int8 *htsv. Honestly I think 0002 and 0003 are ready to commit, if you're not too opposed to them, or if we can find some relatively small changes that would address your objections. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Sep 21, 2023 at 3:53 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Sep 13, 2023 at 1:29 PM Andres Freund <andres@anarazel.de> wrote: > > > > /* > > > * The criteria for counting a tuple as live in this block need to > > > @@ -1682,7 +1664,7 @@ retry: > > > * (Cases where we bypass index vacuuming will violate this optimistic > > > * assumption, but the overall impact of that should be negligible.) > > > */ > > > - switch (res) > > > + switch ((HTSV_Result) presult.htsv[offnum]) > > > { > > > case HEAPTUPLE_LIVE: > > > > I think we should assert that we have a valid HTSV_Result here, i.e. not > > -1. You could wrap the cast and Assert into an inline funciton as well. > > This isn't a bad idea, although I don't find it completely necessary either. Attached v5 does this. Even though a value of -1 would hit the default switch case and error out, I can see the argument for this validation -- as all other places switching on an HTSV_Result are doing so on a value which was always an HTSV_Result. Once I started writing the function comment, however, I felt a bit awkward. In order to make the function available to both pruneheap.c and vacuumlazy.c, I had to put it in a header file. Writing a function, available to anyone including heapam.h, which takes an int and returns an HTSV_Result feels a bit odd. Do we want it to be common practice to use an int value outside the valid enum range to store "status not yet computed" for HTSV_Results? Anyway, on a tactical note, I added the inline function to heapam.h below the PruneResult definition since it is fairly tightly coupled to the htsv array in PruneResult. All of the function prototypes are under a comment that says "function prototypes for heap access method" -- which didn't feel like an accurate description of this function. I wonder if it makes sense to have pruneheap.c include vacuum.h and move pruning specific stuff like this helper and PruneResult over there? I can't remember why I didn't do this before, but maybe there is a reason not to? I also wasn't sure if I needed to forward declare the inline function or not. Oh, and, one more note. I've dropped the former patch 0001 which changed the function comment about off_loc above heap_page_prune(). I have plans to write a separate patch adding an error context callback for HOT pruning with the offset number and would include such a change in that patch. - Melanie
Attachment
On Thu, Sep 28, 2023 at 8:46 AM Melanie Plageman <melanieplageman@gmail.com> wrote: > Once I started writing the function comment, however, I felt a bit > awkward. In order to make the function available to both pruneheap.c > and vacuumlazy.c, I had to put it in a header file. Writing a > function, available to anyone including heapam.h, which takes an int > and returns an HTSV_Result feels a bit odd. Do we want it to be common > practice to use an int value outside the valid enum range to store > "status not yet computed" for HTSV_Results? I noticed the awkwardness of that return convention when I reviewed the first version of this patch, but I decided it wasn't worth spending time discussing. To avoid it, we'd either need to add a new HTSV_Result that is only used here, or add a new type HTSV_Result_With_An_Extra_Value and translate between the two, or pass back a boolean + an enum instead of an array of int8. And all of those seem to me to suck -- the first two are messy and the third would make the return value much wider. So, no, I don't really like this, but also, what would actually be any better? Also, IMV at least, it's more of an issue of it being sort of ugly than of anything becoming common practice, because how many callers of heap_page_prune() are there ever going to be? AFAIK, we've only ever had two since forever, and even if we grow one or two more at some point, that's still not that many. I went ahead and committed 0001. If Andres still wants to push for more renaming there, that can be a follow-up patch. And we can see if he or anyone else has any comments on this new version of 0002. To me we're down into the level of details that probably don't matter very much one way or the other, but others may disagree. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2023-09-28 11:25:04 -0400, Robert Haas wrote: > I went ahead and committed 0001. If Andres still wants to push for > more renaming there, that can be a follow-up patch. Agreed. > And we can see if he or anyone else has any comments on this new version of > 0002. To me we're down into the level of details that probably don't matter > very much one way or the other, but others may disagree. The only thought I have is that it might be worth to amend the comment in lazy_scan_prune() to mention that such a tuple won't need to be frozen, because it was visible to another session when vacuum started. Greetings, Andres Freund
On Sat, Sep 30, 2023 at 1:02 PM Andres Freund <andres@anarazel.de> wrote: > The only thought I have is that it might be worth to amend the comment in > lazy_scan_prune() to mention that such a tuple won't need to be frozen, > because it was visible to another session when vacuum started. I revised the comment a bit, incorporating that language, and committed. -- Robert Haas EDB: http://www.enterprisedb.com