Re: Confine vacuum skip logic to lazy_scan_skip - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Confine vacuum skip logic to lazy_scan_skip
Date
Msg-id 20240308173410.xetnhsiwx2kaljfa@liskov
Whole thread Raw
In response to Re: Confine vacuum skip logic to lazy_scan_skip  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Confine vacuum skip logic to lazy_scan_skip
Re: Confine vacuum skip logic to lazy_scan_skip
List pgsql-hackers
On Fri, Mar 08, 2024 at 06:07:33PM +0200, Heikki Linnakangas wrote:
> On 08/03/2024 02:46, Melanie Plageman wrote:
> > On Wed, Mar 06, 2024 at 10:00:23PM -0500, Melanie Plageman wrote:
> > > On Wed, Mar 06, 2024 at 09:55:21PM +0200, Heikki Linnakangas wrote:
> > I will say that now all of the variable names are *very* long. I didn't
> > want to remove the "state" from LVRelState->next_block_state. (In fact, I
> > kind of miss the "get". But I had to draw the line somewhere.) I think
> > without "state" in the name, next_block sounds too much like a function.
> > 
> > Any ideas for shortening the names of next_block_state and its members
> > or are you fine with them?
> 
> Hmm, we can remove the inner struct and add the fields directly into
> LVRelState. LVRelState already contains many groups of variables, like
> "Error reporting state", with no inner structs. I did it that way in the
> attached patch. I also used local variables more.

+1; I like the result of this.

> > However, by adding a vmbuffer to next_block_state, the callback may be
> > able to avoid extra VM fetches from one invocation to the next.
> 
> That's a good idea, holding separate VM buffer pins for the next-unskippable
> block and the block we're processing. I adopted that approach.

Cool. It can't be avoided with streaming read vacuum, but I wonder if
there would ever be adverse effects to doing it on master? Maybe if we
are doing a lot of skipping and the block of the VM for the heap blocks
we are processing ends up changing each time but we would have had the
right block of the VM if we used the one from
heap_vac_scan_next_block()?

Frankly, I'm in favor of just doing it now because it makes
lazy_scan_heap() less confusing.

> My compiler caught one small bug when I was playing with various
> refactorings of this: heap_vac_scan_next_block() must set *blkno to
> rel_pages, not InvalidBlockNumber, after the last block. The caller uses the
> 'blkno' variable also after the loop, and assumes that it's set to
> rel_pages.

Oops! Thanks for catching that.

> I'm pretty happy with the attached patches now. The first one fixes the
> existing bug I mentioned in the other email (based on the on-going
> discussion that might not how we want to fix it though).

ISTM we should still do the fix you mentioned -- seems like it has more
upsides than downsides?

> From b68cb29c547de3c4acd10f31aad47b453d154666 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Fri, 8 Mar 2024 16:00:22 +0200
> Subject: [PATCH v8 1/3] Set all_visible_according_to_vm correctly with
>  DISABLE_PAGE_SKIPPING
> 
> It's important for 'all_visible_according_to_vm' to correctly reflect
> whether the VM bit is set or not, even when we are not trusting the VM
> to skip pages, because contrary to what the comment said,
> lazy_scan_prune() relies on it.
> 
> If it's incorrectly set to 'false', when the VM bit is in fact set,
> lazy_scan_prune() will try to set the VM bit again and dirty the page
> unnecessarily. As a result, if you used DISABLE_PAGE_SKIPPING, all
> heap pages were dirtied, even if there were no changes. We would also
> fail to clear any VM bits that were set incorrectly.
> 
> This was broken in commit 980ae17310, so backpatch to v16.

LGTM.

> From 47af1ca65cf55ca876869b43bff47f9d43f0750e Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Fri, 8 Mar 2024 17:32:19 +0200
> Subject: [PATCH v8 2/3] Confine vacuum skip logic to lazy_scan_skip()
> ---
>  src/backend/access/heap/vacuumlazy.c | 256 +++++++++++++++------------
>  1 file changed, 141 insertions(+), 115 deletions(-)
> 
> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index ac55ebd2ae5..0aa08762015 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -204,6 +204,12 @@ typedef struct LVRelState
>      int64        live_tuples;    /* # live tuples remaining */
>      int64        recently_dead_tuples;    /* # dead, but not yet removable */
>      int64        missed_dead_tuples; /* # removable, but not removed */

Perhaps we should add a comment to the blkno member of LVRelState
indicating that it is used for error reporting and logging?

> +    /* State maintained by heap_vac_scan_next_block() */
> +    BlockNumber current_block;    /* last block returned */
> +    BlockNumber next_unskippable_block; /* next unskippable block */
> +    bool        next_unskippable_allvis;    /* its visibility status */
> +    Buffer        next_unskippable_vmbuffer;    /* buffer containing its VM bit */
>  } LVRelState;

>  /*
> -static BlockNumber
> -lazy_scan_skip(LVRelState *vacrel, Buffer *vmbuffer, BlockNumber next_block,
> -               bool *next_unskippable_allvis, bool *skipping_current_range)
> +static bool
> +heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
> +                         bool *all_visible_according_to_vm)
>  {
> -    BlockNumber rel_pages = vacrel->rel_pages,
> -                next_unskippable_block = next_block,
> -                nskippable_blocks = 0;
> +    BlockNumber next_block;
>      bool        skipsallvis = false;
> +    BlockNumber rel_pages = vacrel->rel_pages;
> +    BlockNumber next_unskippable_block;
> +    bool        next_unskippable_allvis;
> +    Buffer        next_unskippable_vmbuffer;
>  
> -    *next_unskippable_allvis = true;
> -    while (next_unskippable_block < rel_pages)
> -    {
> -        uint8        mapbits = visibilitymap_get_status(vacrel->rel,
> -                                                       next_unskippable_block,
> -                                                       vmbuffer);
> +    /* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
> +    next_block = vacrel->current_block + 1;
>  
> -        if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
> +    /* Have we reached the end of the relation? */
> +    if (next_block >= rel_pages)
> +    {
> +        if (BufferIsValid(vacrel->next_unskippable_vmbuffer))
>          {
> -            Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
> -            *next_unskippable_allvis = false;
> -            break;
> +            ReleaseBuffer(vacrel->next_unskippable_vmbuffer);
> +            vacrel->next_unskippable_vmbuffer = InvalidBuffer;
>          }

Good catch here. Also, I noticed that I set current_block to
InvalidBlockNumber too which seems strictly worse than leaving it as
rel_pages + 1 -- just in case a future dev makes a change that
accidentally causes heap_vac_scan_next_block() to be called again and
adding InvalidBlockNumber + 1 would end up going back to 0. So this all
looks correct to me.

> +        *blkno = rel_pages;
> +        return false;
> +    }

> +    next_unskippable_block = vacrel->next_unskippable_block;
> +    next_unskippable_allvis = vacrel->next_unskippable_allvis;

Wishe there was a newline here.

I see why you removed my treatise-level comment that was here about
unskipped skippable blocks. However, when I was trying to understand
this code, I did wish there was some comment that explained to me why we
needed all of the variables next_unskippable_block,
next_unskippable_allvis, all_visible_according_to_vm, and current_block.

The idea that we would choose not to skip a skippable block because of
kernel readahead makes sense. The part that I had trouble wrapping my
head around was that we want to also keep the visibility status of both
the beginning and ending blocks of the skippable range and then use
those to infer the visibility status of the intervening blocks without
another VM lookup if we decide not to skip them.

> +    if (next_unskippable_block == InvalidBlockNumber ||
> +        next_block > next_unskippable_block)
> +    {
>          /*
> -         * Caller must scan the last page to determine whether it has tuples
> -         * (caller must have the opportunity to set vacrel->nonempty_pages).
> -         * This rule avoids having lazy_truncate_heap() take access-exclusive
> -         * lock on rel to attempt a truncation that fails anyway, just because
> -         * there are tuples on the last page (it is likely that there will be
> -         * tuples on other nearby pages as well, but those can be skipped).
> -         *
> -         * Implement this by always treating the last block as unsafe to skip.
> +         * Find the next unskippable block using the visibility map.
>           */
> -        if (next_unskippable_block == rel_pages - 1)
> -            break;
> +        next_unskippable_block = next_block;
> +        next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
> +        for (;;)

Ah yes, my old loop condition was redundant with the break if
next_unskippable_block == rel_pages - 1. This is better

> +        {
> +            uint8        mapbits = visibilitymap_get_status(vacrel->rel,
> +                                                           next_unskippable_block,
> +                                                           &next_unskippable_vmbuffer);
>  
> -        /* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
> -        if (!vacrel->skipwithvm)

...
> +            }
> +
> +            vacuum_delay_point();
> +            next_unskippable_block++;
>          }

Would love a newline here

> +        /* write the local variables back to vacrel */
> +        vacrel->next_unskippable_block = next_unskippable_block;
> +        vacrel->next_unskippable_allvis = next_unskippable_allvis;
> +        vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
>  
...

> -    if (nskippable_blocks < SKIP_PAGES_THRESHOLD)
> -        *skipping_current_range = false;
> +    if (next_block == next_unskippable_block)
> +        *all_visible_according_to_vm = next_unskippable_allvis;
>      else
> -    {
> -        *skipping_current_range = true;
> -        if (skipsallvis)
> -            vacrel->skippedallvis = true;
> -    }
> -
> -    return next_unskippable_block;
> +        *all_visible_according_to_vm = true;

Also a newline here

> +    *blkno = vacrel->current_block = next_block;
> +    return true;
>  }
>  

> From 941ae7522ab6ac24ca5981303e4e7f6e2cba7458 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Sun, 31 Dec 2023 12:49:56 -0500
> Subject: [PATCH v8 3/3] Remove unneeded vacuum_delay_point from
>  heap_vac_scan_get_next_block
> 
> heap_vac_scan_get_next_block() does relatively little work, so there is
> no need to call vacuum_delay_point(). A future commit will call
> heap_vac_scan_get_next_block() from a callback, and we would like to
> avoid calling vacuum_delay_point() in that callback.
> ---
>  src/backend/access/heap/vacuumlazy.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index 0aa08762015..e1657ef4f9b 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -1172,7 +1172,6 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
>                  skipsallvis = true;
>              }
>  
> -            vacuum_delay_point();
>              next_unskippable_block++;
>          }
>          /* write the local variables back to vacrel */
> -- 
> 2.39.2
> 

LGTM

- Melanie



pgsql-hackers by date:

Previous
From: "Andrey M. Borodin"
Date:
Subject: Re: Emitting JSON to file using COPY TO
Next
From: "Andrey M. Borodin"
Date:
Subject: Re: CF entries for 17 to be reviewed