Re: Emit fewer vacuum records by reaping removable tuples during pruning - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date
Msg-id 20240115210324.ryxxppdjqosjqrgw@liskov
Whole thread Raw
In response to Re: Emit fewer vacuum records by reaping removable tuples during pruning  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Emit fewer vacuum records by reaping removable tuples during pruning
List pgsql-hackers
On Mon, Jan 15, 2024 at 12:29:57PM -0500, Robert Haas wrote:
> On Fri, Jan 12, 2024 at 4:05 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > Yea, that works for now. I mean, I think the way we should do it is
> > update the FSM in lazy_scan_noprune(), but, for the purposes of this
> > patch, yes. has_lpdead_items output parameter seems fine to me.
> 
> Here's v2.
> 
> It's not exactly clear to me why you want to update the FSM in
> lazy_scan_[no]prune(). When I first looked at v7-0004, I said to
> myself "well, this is dumb, because now we're just duplicating
> something that is common to both cases". But then I realized that the
> logic was *already* duplicated in lazy_scan_heap(), and that by
> pushing the duplication down to lazy_scan_[no]prune(), you made the
> two copies identical rather than, as at present, having two copies
> that differ from each other. Perhaps that's a sufficient reason to
> make the change all by itself, but it seems like what would be really
> good is if we only needed one copy of the logic. I don't know if
> that's achievable, though.

Upthread in v4-0004, I do have a version which combines the two FSM
updates for lazy_scan_prune() and lazy_scan_noprune().

If you move the VM updates from lazy_scan_heap() into lazy_scan_prune(),
then there is little difference between the prune and no prune cases in
lazy_scan_heap(). The main difference is that, when lazy_scan_noprune()
returns true, you are supposed to avoid calling lazy_scan_new_or_empty()
again and have to avoid calling lazy_scan_prune(). I solved this with a
local variable "do_prune" and checked it before calling
lazy_scan_new_or_empty() and lazy_scan_prune().

I moved away from this approach because it felt odd to test "do_prune"
before calling lazy_scan_new_or_empty(). Though, perhaps it doesn't
matter if we just call lazy_scan_new_or_empty() again. We do that when
lazy_scan_noprune() returns false anyway.

I thought perhaps we could circumvent this issue by moving
lazy_scan_new_or_empty() into lazy_scan_prune(). But, this seemed wrong
to me because, as it stands now, if lazy_scan_new_or_empty() returns
true, we would want to skip the VM update code and FSM update code in
lazy_scan_heap() after lazy_scan_prune(). That would mean
lazy_scan_prune() would have to return something to indicate that.

> More generally, I somewhat question whether it's really right to push
> things from lazy_scan_heap() and into lazy_scan_[no]prune(), mostly
> because of the risk of having to duplicate logic. I'm not even really
> convinced that it's good for us to have both of those functions.
> There's an awful lot of code duplication between them already. Each
> has a loop that tests the status of each item and then, for LP_USED,
> switches on htsv_get_valid_status or HeapTupleSatisfiesVacuum. It
> doesn't seem trivial to unify all that code, but it doesn't seem very
> nice to have two copies of it, either.

I agree that the duplicated logic in both places is undesirable.
I think the biggest issue with combining them would be that when
lazy_scan_noprune() returns false, it needs to go get a cleanup lock and
then invoke the logic of lazy_scan_prune() for all the tuples on the
page. This seems a little hard to get right in a single function.

Then there are more trivial-to-solve differences like invoking
heap_tuple_should_freeze() and not bothering with the whole
heap_prepare_freeze_tuple() if we only have the share lock.

I am willing to try and write a version of this to see if it is better.
I will say, though, my agenda was to eventually push the actions taken
in the loop in lazy_scan_prune() into heap_page_prune() and
heap_prune_chain().

> From 32684f41d1dd50f726aa0dfe8a5d816aa5c42d64 Mon Sep 17 00:00:00 2001
> From: Robert Haas <rhaas@postgresql.org>
> Date: Mon, 15 Jan 2024 12:05:52 -0500
> Subject: [PATCH v2] Be more consistent about whether to update the FSM while
>  vacuuming.

Few small review comments below, but, overall, LGTM.

> Previously, when lazy_scan_noprune() was called and returned true, we would
> update the FSM immediately if the relation had no indexes or if the page
> contained no dead items. On the other hand, when lazy_scan_prune() was
> called, we would update the FSM if either of those things was true or
> if index vacuuming was disabled. Eliminate that behavioral difference by
> considering vacrel->do_index_vacuuming in both cases.
> 
> Also, instead of having lazy_scan_noprune() make the decision
> internally and pass it back to the caller via *recordfreespace, just
> have it pass the number of LP_DEAD items back to the caller, and then

It doesn't pass the number of LP_DEAD items back to the caller. It
passes a boolean.

> let the caller make the decision.  That seems less confusing, since
> the caller also decides in the lazy_scan_prune() case; moreover, this
> way, the whole test is in one place, instead of spread out.

Perhaps it isn't important, but I find this wording confusing. You
mention lazy_scan_prune() and then mention that "the whole test is in
one place instead of spread out" -- which kind of makes it sound like
you are consolidating FSM updates for both the lazy_scan_noprune() and
lazy_scan_prune() cases. Perhaps simply flipping the order of the "since
the caller" and "moreover, this way" conjunctions would solve it. I
defer to your judgment.

> ---
>  src/backend/access/heap/vacuumlazy.c | 58 ++++++++++++++--------------
>  1 file changed, 29 insertions(+), 29 deletions(-)
> 
> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index b63cad1335..f17816b81d 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -1960,7 +1974,6 @@ lazy_scan_noprune(LVRelState *vacrel,
>      Assert(BufferGetBlockNumber(buf) == blkno);
>  
>      hastup = false;                /* for now */
> -    *recordfreespace = false;    /* for now */
>  
>      lpdead_items = 0;
>      live_tuples = 0;
> @@ -2102,18 +2115,8 @@ lazy_scan_noprune(LVRelState *vacrel,
>              hastup = true;
>              missed_dead_tuples += lpdead_items;
>          }
> -
> -        *recordfreespace = true;
> -    }
> -    else if (lpdead_items == 0)
> -    {
> -        /*
> -         * Won't be vacuuming this page later, so record page's freespace in
> -         * the FSM now
> -         */
> -        *recordfreespace = true;
>      }
> -    else
> +    else if (lpdead_items != 0)

It stuck out to me a bit that this test is if lpdead_items != 0 and the
one above it:

    /* Save any LP_DEAD items found on the page in dead_items array */
    if (vacrel->nindexes == 0)
    {
        /* Using one-pass strategy (since table has no indexes) */
        if (lpdead_items > 0)
        {

tests if lpdead_items > 0. It is more the inconsistency that bothered me
than the fact that lpdead_items is signed.

> @@ -2159,6 +2156,9 @@ lazy_scan_noprune(LVRelState *vacrel,
>      if (hastup)
>          vacrel->nonempty_pages = blkno + 1;
>  
> +    /* Did we find LP_DEAD items? */
> +    *has_lpdead_items = (lpdead_items > 0);

I would drop this comment. The code doesn't really need it, and the
reason we care if there are LP_DEAD items is not because they are dead
but because we want to know if we'll touch this page again. You don't
need to rehash all that here, so I think omitting the comment is enough.

- Melanie



pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: [17] CREATE SUBSCRIPTION ... SERVER
Next
From: Robert Haas
Date:
Subject: Re: Fixing backslash dot for COPY FROM...CSV