Re: Eliminate redundant tuple visibility check in vacuum - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Eliminate redundant tuple visibility check in vacuum |
Date | |
Msg-id | CAAKRu_a9RSogxOSdT4Zjv64h2fJmeBcuM8TRLw1zOzOwFCWucg@mail.gmail.com Whole thread Raw |
In response to | Re: Eliminate redundant tuple visibility check in vacuum (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Eliminate redundant tuple visibility check in vacuum
|
List | pgsql-hackers |
On Thu, Sep 7, 2023 at 1:37 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Sep 6, 2023 at 5:21 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > Yeah, I think this is a fair concern. I have addressed it in the > > attached patches. > > > > I thought a lot about whether or not adding a PruneResult which > > contains only the output parameters and result of heap_page_prune() is > > annoying since we have so many other *Prune* data structures. I > > decided it's not annoying. In some cases, four outputs don't merit a > > new structure. In this case, I think it declutters the code a bit -- > > independent of any other patches I may be writing :) > > I took a look at 0001 and I think that it's incorrect. In the existing > code, *off_loc gets updated before each call to > heap_prune_satisfies_vacuum(). This means that if an error occurs in > heap_prune_satisfies_vacuum(), *off_loc will as of that moment contain > the relevant offset number. In your version, the relevant offset > number will only be stored in some local structure to which the caller > doesn't yet have access. The difference is meaningful. lazy_scan_prune > passes off_loc as vacrel->offnum, which means that if an error > happens, vacrel->offnum will have the right value, and so when the > error context callback installed by heap_vacuum_rel() fires, namely > vacuum_error_callback(), it can look at vacrel->offnum and know where > the error happened. With your patch, I think that would no longer > work. You are right. That is a major problem. Thank you for finding that. I was able to confirm the breakage by patching in an error to heap_page_prune() after we have set off_loc and confirming that the error context has the offset in master and is missing it with my patch applied. I can fix it by changing the type of PruneResult->off_loc to be an OffsetNumber pointer. This does mean that PruneResult will be initialized partially by heap_page_prune() callers. I wonder if you think that undermines the case for making a new struct. I still want to eliminate the NULL check of off_loc in heap_page_prune() by making it a required parameter. Even though on-access pruning does not have an error callback mechanism which uses the offset, it seems better to have a pointless local variable in heap_page_prune_opt() than to do a NULL check for every tuple pruned. > I haven't run the regression suite with 0001 applied. I'm guessing > that you did, and that they passed, which if true means that we don't > have a test for this, which is a shame, although writing such a test > might be a bit tricky. If there is a test case for this and you didn't > run it, woops. There is no test coverage for the vacuum error callback context currently (tests passed for me). I looked into how we might add such a test. First, I investigated what kind of errors might occur during heap_prune_satisfies_vacuum(). Some of the multixact functions called by HeapTupleSatisfiesVacuumHorizon() could error out -- for example GetMultiXactIdMembers(). It seems difficult to trigger the errors in GetMultiXactIdMembers(), as we would have to cause wraparound. It would be even more difficult to ensure that we hit those specific errors from a call stack containing heap_prune_satisfies_vacuum(). As such, I'm not sure I can think of a way to protect future developers from repeating my mistake--apart from improving the comment like you mentioned. - Melanie
pgsql-hackers by date: