On Fri, Jan 5, 2024 at 12:59 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> I actually think we are going to want to stop referring to these steps
> as pruning and vacuuming. It is confusing because vacuuming refers to
> the whole process of doing garbage collection on the table and also to
> the specific step of setting dead line pointers unused. If we called
> these steps say, pruning and reaping, that may be more clear.
I agree that there's some terminological confusion here, but if we
leave all of the current naming unchanged and start using new naming
conventions in new code, I don't think that's going to clear things
up. Getting rid of the weird naming with pruning, "scanning" (that is
part of vacuum), and "vacuuming" (that is another part of vacuum) is
gonna take some work.
> Vacuuming consists of three phases -- the first pass, index vacuuming,
> and the second pass. I don't think we should dictate what happens in
> each pass. That is, we shouldn't expect only pruning to happen in the
> first pass and only reaping to happen in the second pass. For example,
> I think Andres has previously proposed doing another round of pruning
> after index vacuuming. The second pass/third phase is distinguished
> primarily by being after index vacuuming.
>
> If we think about it this way, that frees us up to set dead line
> pointers unused in the first pass when the table has no indexes. For
> clarity, I could add a block comment explaining that doing this is an
> optimization and not a logical requirement. One way to make this even
> more clear would be to set the dead line pointers unused in a separate
> loop after heap_prune_chain() as I proposed upthread.
I don't really disagree with any of this, but I think the question is
whether the code is cleaner with mark-LP-as-unused pushed down into
pruning or whether it's better the way it is. Andres seems confident
that the change won't suck for performance, which is good, but I'm not
quite convinced that it's the right direction to go with the code, and
he doesn't seem to be either. Perhaps this all turns on this point:
MP> I am planning to add a VM update into the freeze record, at which point
MP> I will move the VM update code into lazy_scan_prune(). This will then
MP> allow us to consolidate the freespace map update code for the prune and
MP> noprune cases and make lazy_scan_heap() short and sweet.
Can we see what that looks like on top of this change?
--
Robert Haas
EDB: http://www.enterprisedb.com