Re: Confine vacuum skip logic to lazy_scan_skip - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Confine vacuum skip logic to lazy_scan_skip
Date
Msg-id 20240308004614.xn7x3k6qz7jw7ljx@liskov
Whole thread Raw
In response to Re: Confine vacuum skip logic to lazy_scan_skip  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Confine vacuum skip logic to lazy_scan_skip
Re: Confine vacuum skip logic to lazy_scan_skip
List pgsql-hackers
On Wed, Mar 06, 2024 at 10:00:23PM -0500, Melanie Plageman wrote:
> On Wed, Mar 06, 2024 at 09:55:21PM +0200, Heikki Linnakangas wrote:
> > I made some further changes. I kept them as separate commits for easier
> > review, see the commit messages for details. Any thoughts on those changes?
> 
> I've given some inline feedback on most of the extra patches you added.
> Short answer is they all seem fine to me except I have a reservations
> about 0008 because of the number of blkno variables flying around. I
> didn't have a chance to rebase these into my existing changes today, so
> either I will do it tomorrow or, if you are feeling like you're on a
> roll and want to do it, that also works!

Attached v7 contains all of the changes that you suggested plus some
additional cleanups here and there.

> > I feel heap_vac_scan_get_next_block() function could use some love. Maybe
> > just some rewording of the comments, or maybe some other refactoring; not
> > sure. But I'm pretty happy with the function signature and how it's called.

I've cleaned up the comments on heap_vac_scan_next_block() in the first
couple patches (not so much in the streaming read user). Let me know if
it addresses your feelings or if I should look for other things I could
change.

I will say that now all of the variable names are *very* long. I didn't
want to remove the "state" from LVRelState->next_block_state. (In fact, I
kind of miss the "get". But I had to draw the line somewhere.) I think
without "state" in the name, next_block sounds too much like a function.

Any ideas for shortening the names of next_block_state and its members
or are you fine with them?

> I was wondering if we should remove the "get" and just go with
> heap_vac_scan_next_block(). I didn't do that originally because I didn't
> want to imply that the next block was literally the sequentially next
> block, but I think maybe I was overthinking it.
> 
> Another idea is to call it heap_scan_vac_next_block() and then the order
> of the words is more like the table AM functions that get the next block
> (e.g. heapam_scan_bitmap_next_block()). Though maybe we don't want it to
> be too similar to those since this isn't a table AM callback.

I've done a version of this.

> > From 27e431e8dc69bbf09d831cb1cf2903d16f177d74 Mon Sep 17 00:00:00 2001
> > From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> > Date: Wed, 6 Mar 2024 20:58:57 +0200
> > Subject: [PATCH v6 6/9] Move vmbuffer back to a local varible in
> >  lazy_scan_heap()
> > 
> > It felt confusing that we passed around the current block, 'blkno', as
> > an argument to lazy_scan_new_or_empty() and lazy_scan_prune(), but
> > 'vmbuffer' was accessed directly in the 'scan_state'.
> > 
> > It was also a bit vague, when exactly 'vmbuffer' was valid. Calling
> > heap_vac_scan_get_next_block() set it, sometimes, to a buffer that
> > might or might not contain the VM bit for 'blkno'. But other
> > functions, like lazy_scan_prune(), assumed it to contain the correct
> > buffer. That was fixed up visibilitymap_pin(). But clearly it was not
> > "owned" by heap_vac_scan_get_next_block(), like the other 'scan_state'
> > fields.
> > 
> > I moved it back to a local variable, like it was. Maybe there would be
> > even better ways to handle it, but at least this is not worse than
> > what we have in master currently.
> 
> I'm fine with this. I did it the way I did (grouping it with the
> "next_unskippable_block" in the skip struct), because I think that this
> vmbuffer is always the buffer containing the VM bit for the next
> unskippable block -- which sometimes is the block returned by
> heap_vac_scan_get_next_block() and sometimes isn't.
> 
> I agree it might be best as a local variable but perhaps we could retain
> the comment about it being the block of the VM containing the bit for the
> next unskippable block. (Honestly, the whole thing is very confusing).

In 0001-0004 I've stuck with only having the local variable vmbuffer in
lazy_scan_heap().

In 0006 (introducing pass 1 vacuum streaming read user) I added a
vmbuffer back to the next_block_state (while also keeping the local
variable vmbuffer in lazy_scan_heap()). The vmbuffer in lazy_scan_heap()
contains the block of the VM containing visi information for the next
unskippable block or for the current block if its visi information
happens to be in the same block of the VM as either 1) the next
unskippable block or 2) the most recently processed heap block.

Streaming read vacuum separates this visibility check in
heap_vac_scan_next_block() from the main loop of lazy_scan_heap(), so we
can't just use a local variable anymore. Now the local variable vmbuffer
in lazy_scan_heap() will only already contain the block with the visi
information for the to-be-processed block if it happens to be in the
same VM block as the most recently processed heap block. That means
potentially more VM fetches.

However, by adding a vmbuffer to next_block_state, the callback may be
able to avoid extra VM fetches from one invocation to the next.

Note that next_block->current_block in the streaming read vacuum context
is actually the prefetch block.


- Melanie

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: improve ssl error code, 2147483650
Next
From: Jeff Davis
Date:
Subject: Re: Built-in CTYPE provider