Re: Using read_stream in index vacuum - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Using read_stream in index vacuum
Date
Msg-id CAAKRu_aMiTqCB3PvUeCT46xSv6tkeKfqcnfpnvzLx4Yj_b+KDw@mail.gmail.com
Whole thread Raw
In response to Re: Using read_stream in index vacuum  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Responses Re: Using read_stream in index vacuum
List pgsql-hackers
On Sun, Oct 20, 2024 at 10:19 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 20 Oct 2024, at 15:16, Junwang Zhao <zhjwpku@gmail.com> wrote:
> >
> > I'm not sure if I did not express myself correctly, I didn't mean to
> > restart the stream,
> > I mean we can create a new stream for each outer loop, I attached a
> > refactor 0002
> > based on your 0001, correct me if I'm wrong.
>
> I really like how the code looks with this refactoring. But I think we need some input from Thomas.
> Is it OK if we start handful of streams for 1 page at the end of vacuum scan? How costly is to start a new scan?

You shouldn't need either loop in btvacuumscan().

For the inner loop:

for (; scanblkno < num_pages; scanblkno++)
{
Buffer buf = read_stream_next_buffer(stream, NULL);
btvacuumpage(&vstate, buf);
if (info->report_progress)
pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,
scanblkno);
}

you should just be able to be do something like

Buffer buf = read_stream_next_buffer(stream, NULL);
if (BufferIsValid(buf))
      btvacuumpage(&vstate, buf);

Obviously I am eliding some details and clean-up and such. But your
read stream callback should be responsible for advancing the block
number and thus you shouldn't need to loop like this in
btvacuumscan().

The whole point of the read stream callback provided by the caller is
that the logic to get the next block should be contained there
(read_stream_get_block() is an exception to this).

For the outer loop,  I feel like we have options. For example, maybe
the read stream callback can call RelationGetNumberOfBlocks(). I mean
maybe we don't want to have to take a relation extension lock in a
callback.

So, alternatively, we could add some kind of restartable flag to the
read stream API. So, after the callback returns InvalidBlockNumber and
the read_stream_next_buffer() returns NULL, we could call something
like read_stream_update() or read_stream_restart() or something. We
would have updated the BlockRangeReadStreamPrivate->last_exclusive
value. In your case it might not be substantially different
operationally than making new read streams (since you are not
allocating memory for a per-buffer data structure). But, I think the
code would read much better than making new read stream objects in a
loop.

- Melanie



pgsql-hackers by date:

Previous
From: Michel Pelletier
Date:
Subject: Re: Using Expanded Objects other than Arrays from plpgsql
Next
From: Melanie Plageman
Date:
Subject: Re: Using read_stream in index vacuum