Thread: Re: read stream on amcheck

Re: read stream on amcheck

From
Kirill Reshke
Date:
On Thu, 2 Jan 2025 at 20:29, Matheus Alcantara <matheusssilv97@gmail.com> wrote:
>
> Hi,
>
> While reviewing some other patches implementing stream API for core subsystems,
> I noticed that the amcheck extension could also benefit from that.
>
> Notice the refactor when handling the "skip" parameter; The logic was moved to
> the heapam_read_stream_next_block callback so that verify_heapam don't need to
> touch any private field of heapam_read_stream_next_block_private struct.
>
> One other think to mention is that the test cases of "skip" parameter
> that I've seen just test when the first page is corrupted, so I think
> that a carefully review on callback logic would be good to ensure that
> we don't accidentally skip a page when doing p->current_blocknum++;
>
> This patch doesn't show any performance improvements (or regression)
> but I think that it would be good to replace the ReadBufferExtended
> usage with the read stream API, so in the future it could be benefit
> from the AIO project.
>
> --
> Matheus Alcantara

Hi!
+1 on idea

However, this:

>- if (skip_option == SKIP_PAGES_ALL_FROZEN)
>- {
>- if ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
>- continue;
>- }
>-
>- if (skip_option == SKIP_PAGES_ALL_VISIBLE)
>- {
>- if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
>- continue;
>- }

changes to this
(in heapam_read_stream_next_block)
>+
>+ if ((p->skip_option == SKIP_PAGES_ALL_FROZEN && (mapbts & VISIBILITYMAP_ALL_FROZEN) != 0) ||
>+ (p->skip_option == SKIP_PAGES_ALL_VISIBLE && (mapbts & VISIBILITYMAP_ALL_VISIBLE) != 0))
> + continue;

I don't understand this change. The patch aims to be purely applying
streaming API, not refactoring. And if we refactor this code, this is
less readable than it was.

Other than that, LGTM.

-- 
Best regards,
Kirill Reshke



Re: read stream on amcheck

From
jian he
Date:
On Fri, Jan 3, 2025 at 1:53 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
>
> Yeap, I agree. Attached a v2 fixed.
>

hi. some minor issue i found.

+#include "storage/block.h"
no need, since "#include "storage/bufmgr.h" already included it.

do we need to add ``CHECK_FOR_INTERRUPTS()`` in heapam_read_stream_next_block?

heapam_read_stream_next_block_private need add to
src/tools/pgindent/typedefs.list

overall, it looks good to me.



Re: read stream on amcheck

From
Nazir Bilal Yavuz
Date:
Hi,

Thank you for working on this!

On Tue, 11 Feb 2025 at 21:41, Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> Thanks for the review! v3 with the fixes attached.

I have a small comment.

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 827312306f6..8c83870db7d 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c

+    stream = read_stream_begin_relation(READ_STREAM_SEQUENTIAL,
+                                        ctx.bstrategy,
+                                        ctx.rel,

I think we do not always want to disable prefetching by explicitly
setting the READ_STREAM_SEQUENTIAL flag. If skip_option !=
SKIP_PAGES_NONE, then the blocks might not always be sequential. So,
what do you think about setting READ_STREAM_SEQUENTIAL when the
skip_option == SKIP_PAGES_NONE and
READ_STREAM_DEFAULT otherwise.

Other than that, LGTM.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



Re: read stream on amcheck

From
vignesh C
Date:
On Wed, 12 Feb 2025 at 00:11, Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> Hi,
>
> Em ter., 11 de fev. de 2025 às 03:39, jian he
> <jian.universality@gmail.com> escreveu:
> > hi. some minor issue i found.
> >
> > +#include "storage/block.h"
> > no need, since "#include "storage/bufmgr.h" already included it.
> >
> Fixed
>
> > do we need to add ``CHECK_FOR_INTERRUPTS()`` in heapam_read_stream_next_block?
> >
> The current code on master don't CHECK_FOR_INTERRUPTS, so I would prefer to not
> change this behaviour, but I think that is considerable.
>
> > heapam_read_stream_next_block_private need add to
> > src/tools/pgindent/typedefs.list
> >
> Fixed
>
> > overall, it looks good to me.
>
> Thanks for the review! v3 with the fixes attached.

I noticed that Nazir Bilal Yavuz's comment from [1] is not yet
addressed, i have changed the status of commitfest entry to Waiting on
Author, please address them and update it to Needs review.
[1] - https://www.postgresql.org/message-id/CAN55FZ2CGxcqTk_LLRPAi2aFNqtR4o=JPfjN0=yT0yObfQ2h2g@mail.gmail.com

Regards,
Vignesh



Re: read stream on amcheck

From
Matheus Alcantara
Date:
Hi,

On Sun, Mar 16, 2025 at 10:30 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 12 Feb 2025 at 00:11, Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
> >
> > Hi,
> >
> > Em ter., 11 de fev. de 2025 às 03:39, jian he
> > <jian.universality@gmail.com> escreveu:
> > > hi. some minor issue i found.
> > >
> > > +#include "storage/block.h"
> > > no need, since "#include "storage/bufmgr.h" already included it.
> > >
> > Fixed
> >
> > > do we need to add ``CHECK_FOR_INTERRUPTS()`` in heapam_read_stream_next_block?
> > >
> > The current code on master don't CHECK_FOR_INTERRUPTS, so I would prefer to not
> > change this behaviour, but I think that is considerable.
> >
> > > heapam_read_stream_next_block_private need add to
> > > src/tools/pgindent/typedefs.list
> > >
> > Fixed
> >
> > > overall, it looks good to me.
> >
> > Thanks for the review! v3 with the fixes attached.
>
> I noticed that Nazir Bilal Yavuz's comment from [1] is not yet
> addressed, i have changed the status of commitfest entry to Waiting on
> Author, please address them and update it to Needs review.
> [1] - https://www.postgresql.org/message-id/CAN55FZ2CGxcqTk_LLRPAi2aFNqtR4o=JPfjN0=yT0yObfQ2h2g@mail.gmail.com
>
Sorry for the delay, attached v4 with the remaining fixes.

--
Matheus Alcantara

Attachment

Re: read stream on amcheck

From
Melanie Plageman
Date:
On Mon, Mar 17, 2025 at 9:47 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> Sorry for the delay, attached v4 with the remaining fixes.

Thanks for the patch.

I started reviewing this with the intent to commit it. But, I decided
while studying it that I want to separate the SKIP_PAGES_NONE case and
the other cases into two callbacks. I think it is easier to read the
skip pages callback this way. The SKIP_PAGES_NONE case is just read
all blocks in the range, so we can use the existing default callback,
block_range_read_cb(). Then the callback for the
SKIP_PAGES_ALL_VISIBLE and SKIP_PAGES_ALL_FROZEN options can be clear
and simple.

I've attached two versions with this proposed structure.

amcheck-readsteram-1callback.patch implements this with one callback
and has the amcheck specific callback private data struct subclass
BlockRangeReadStreamPrivate (I called it
heapamcheck_rs_perblock_data).

amcheck-readstream-2callbacks.patch wraps block_range_read_cb() in an
amcheck specific callback and creates a BlockRangeReadStreamPrivate
and fills it in from the heapamcheck_rs_perblock_data to pass as
callback_private_data. Because this version is more explicit, it is
more safe. We don't have any type checking facilities that will alert
us if someone adds a member above the BlockRangeReadStreamPrivate in
heapamcheck_rs_perblock_data. But, I'm open to feedback.

- Melanie

Attachment

Re: read stream on amcheck

From
Nazir Bilal Yavuz
Date:
Hi,

On Thu, 27 Mar 2025 at 03:48, Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Mon, Mar 17, 2025 at 9:47 AM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
> >
> > Sorry for the delay, attached v4 with the remaining fixes.
>
> Thanks for the patch.
>
> I started reviewing this with the intent to commit it. But, I decided
> while studying it that I want to separate the SKIP_PAGES_NONE case and
> the other cases into two callbacks. I think it is easier to read the
> skip pages callback this way. The SKIP_PAGES_NONE case is just read
> all blocks in the range, so we can use the existing default callback,
> block_range_read_cb(). Then the callback for the
> SKIP_PAGES_ALL_VISIBLE and SKIP_PAGES_ALL_FROZEN options can be clear
> and simple.
>
> I've attached two versions with this proposed structure.
>
> amcheck-readsteram-1callback.patch implements this with one callback
> and has the amcheck specific callback private data struct subclass
> BlockRangeReadStreamPrivate (I called it
> heapamcheck_rs_perblock_data).
>
> amcheck-readstream-2callbacks.patch wraps block_range_read_cb() in an
> amcheck specific callback and creates a BlockRangeReadStreamPrivate
> and fills it in from the heapamcheck_rs_perblock_data to pass as
> callback_private_data. Because this version is more explicit, it is
> more safe. We don't have any type checking facilities that will alert
> us if someone adds a member above the BlockRangeReadStreamPrivate in
> heapamcheck_rs_perblock_data. But, I'm open to feedback.

I liked the first approach more. We can solve the first approach's
problems by introducing a void pointer to pass to
read_stream_begin_relation(). We can set it to &rsdata.range for the
SKIP_PAGES_NONE case and &rsdata for the rest.

Example patch is attached, heapamcheck_rs_perblock_data is added to
typedefs.list too.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment

Re: read stream on amcheck

From
Melanie Plageman
Date:
On Thu, Mar 27, 2025 at 4:45 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> I liked the first approach more. We can solve the first approach's
> problems by introducing a void pointer to pass to
> read_stream_begin_relation(). We can set it to &rsdata.range for the
> SKIP_PAGES_NONE case and &rsdata for the rest.

Cool. I've gone with this approach but have renamed all the functions
and structs to try and be more consistent and clear.
Committed in 043799fa08c2c and I marked the commitfest entry as such.

- Melanie



Re: read stream on amcheck

From
Matheus Alcantara
Date:
On Thu, Mar 27, 2025 at 3:35 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Thu, Mar 27, 2025 at 4:45 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> >
> > I liked the first approach more. We can solve the first approach's
> > problems by introducing a void pointer to pass to
> > read_stream_begin_relation(). We can set it to &rsdata.range for the
> > SKIP_PAGES_NONE case and &rsdata for the rest.
>
> Cool. I've gone with this approach but have renamed all the functions
> and structs to try and be more consistent and clear.
> Committed in 043799fa08c2c and I marked the commitfest entry as such.

Just my 0.2 cents. I also like the first approach even though I prefer
the v4 version, but anyway, thanks very much for reviewing and
committing!

(I was sending my anwer just when I received your reply)

--
Matheus Alcantara



Re: read stream on amcheck

From
Melanie Plageman
Date:
On Thu, Mar 27, 2025 at 2:46 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> Just my 0.2 cents. I also like the first approach even though I prefer
> the v4 version, but anyway, thanks very much for reviewing and
> committing!

Thanks for the patch!

FWIW, I strongly disliked about v4 that two separate parts of the
callback were responsible for advancing current_blocknum, one to
advance it past blocks we chose to skip and the other to advance it to
the next block.

   for (; p->current_blocknum < p->last_exclusive; p->current_blocknum++)
and
     if (p->current_blocknum < p->last_exclusive)
          return p->current_blocknum++;

I found that alone to be undesirable, but once you add in
SKIP_PAGES_NONE, I think it was very hard to understand.

Besides that, when we implemented streaming read sequential scan, we
ended up making dedicated callbacks for the parallel and non-parallel
cases (see heap_scan_stream_read_next_parallel and
heap_scan_stream_read_next_serial) because it performed better than a
single combined callback with a branch. I didn't validate that amcheck
got the same performance benefit from the dedicated callbacks, but I
don't see why it would be any different.

- Melanie



Re: read stream on amcheck

From
Matheus Alcantara
Date:
On Thu, Mar 27, 2025 at 4:42 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Thu, Mar 27, 2025 at 2:46 PM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
> >
> > Just my 0.2 cents. I also like the first approach even though I prefer
> > the v4 version, but anyway, thanks very much for reviewing and
> > committing!
>
> Thanks for the patch!
>
> FWIW, I strongly disliked about v4 that two separate parts of the
> callback were responsible for advancing current_blocknum, one to
> advance it past blocks we chose to skip and the other to advance it to
> the next block.
>
>    for (; p->current_blocknum < p->last_exclusive; p->current_blocknum++)
> and
>      if (p->current_blocknum < p->last_exclusive)
>           return p->current_blocknum++;
>
> I found that alone to be undesirable, but once you add in
> SKIP_PAGES_NONE, I think it was very hard to understand.
>
> Besides that, when we implemented streaming read sequential scan, we
> ended up making dedicated callbacks for the parallel and non-parallel
> cases (see heap_scan_stream_read_next_parallel and
> heap_scan_stream_read_next_serial) because it performed better than a
> single combined callback with a branch. I didn't validate that amcheck
> got the same performance benefit from the dedicated callbacks, but I
> don't see why it would be any different.

Yeah, it totally makes sense to me now, thanks very much for the details!

--
Matheus Alcantara