Re: Eliminate redundant tuple visibility check in vacuum - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Eliminate redundant tuple visibility check in vacuum
Date
Msg-id CA+TgmoZ8E7DXzAZkV=RF9tfJkQFH=a3KkjQhYNY9W26Jh7AN+w@mail.gmail.com
Whole thread Raw
In response to Re: Eliminate redundant tuple visibility check in vacuum  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Eliminate redundant tuple visibility check in vacuum
List pgsql-hackers
On Wed, Sep 6, 2023 at 5:21 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> Yeah, I think this is a fair concern. I have addressed it in the
> attached patches.
>
> I thought a lot about whether or not adding a PruneResult which
> contains only the output parameters and result of heap_page_prune() is
> annoying since we have so many other *Prune* data structures. I
> decided it's not annoying. In some cases, four outputs don't merit a
> new structure. In this case, I think it declutters the code a bit --
> independent of any other patches I may be writing :)

I took a look at 0001 and I think that it's incorrect. In the existing
code, *off_loc gets updated before each call to
heap_prune_satisfies_vacuum(). This means that if an error occurs in
heap_prune_satisfies_vacuum(), *off_loc will as of that moment contain
the relevant offset number. In your version, the relevant offset
number will only be stored in some local structure to which the caller
doesn't yet have access. The difference is meaningful. lazy_scan_prune
passes off_loc as vacrel->offnum, which means that if an error
happens, vacrel->offnum will have the right value, and so when the
error context callback installed by heap_vacuum_rel() fires, namely
vacuum_error_callback(), it can look at vacrel->offnum and know where
the error happened. With your patch, I think that would no longer
work.

I haven't run the regression suite with 0001 applied. I'm guessing
that you did, and that they passed, which if true means that we don't
have a test for this, which is a shame, although writing such a test
might be a bit tricky. If there is a test case for this and you didn't
run it, woops. This is also why I think it's *extremely* important for
the header comment of a function that takes pointer parameters to
document the semantics of those pointers. Normally they are in
parameters or out parameters or in-out parameters, but here it's
something even more complicated. The existing header comment says
"off_loc is the offset location required by the caller to use in error
callback," which I didn't really understand until I actually looked at
what the code is doing, so I consider that somebody could've done a
better job writing this comment, but in theory you could've also
noticed that, at least AFAICS, there's no way for the function to
return with *off_loc set to anything other than InvalidOffsetNumber.
That means that the statements which set *off_loc to other values must
have some other purpose.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [PATCH] pgrowlocks: Make mode names consistent with docs
Next
From: Markus Winand
Date:
Subject: Re: Wrong command name in writeable-CTE related error messages