Home > mailing lists

Re: Using read_stream in index vacuum - Mailing list pgsql-hackers

From	Melanie Plageman
Subject	Re: Using read_stream in index vacuum
Date	October 22, 2024 16:42:41
Msg-id	CAAKRu_bW1UOyup=jdFw+kOF9bCaAm=9UpiyZtbPMn8n_vnP+ig@mail.gmail.com Whole thread
In response to	Re: Using read_stream in index vacuum ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
List	pgsql-hackers

Tree view

On Tue, Oct 22, 2024 at 2:30 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> > On 22 Oct 2024, at 00:05, Melanie Plageman <melanieplageman@gmail.com> wrote:
> >
> > I was suggesting you call RelationGetNumberOfBlocks() once
> > current_block == last_exclusive in the callback itself.
>
> Consider following sequence of events:
>
> 0. We schedule some buffers for IO
> 1. We call RelationGetNumberOfBlocks() in callback when current_block == last_exclusive and return InvalidBlockNumber
tosignal EOF 
> After this:
> 2. Some page is getting split into new page with number last_exclusive
> 3. Buffers from IO are returned and vacuumed, but not with number last_exclusive, because it was not scheduled

Ah, right, the callback might return InvalidBlockNumber far before
we've actually read (and vacuumed) the blocks it is specifying.

I ran into something similar when trying to use the read stream API
for index prefetching. I added TIDs from the index to a queue that was
passed to the read stream and available in the callback. When the
queue was empty, I needed to check if there were more index entries
and, if so, add more TIDs to the queue (instead of ending the read
stream). So, I wanted some way for the callback to return
InvalidBlockNumber when there might actually be more blocks to
request. This is a kind of "restarting" behavior.

In that case, though, the reason the callback couldn't get more TIDs
when the queue was empty was because of layering violations and not,
like in the case of btree vacuum, because the index might be in a
different state after vacuuming the "last" block. Perhaps there is a
way to make the read stream restartable, though.

I just can't help wondering if there is a way to refactor the code
(potentially in a more invasive way) to make it more natural to use
the read stream API here. I usually hate when people give me such
unhelpful review feedback, though. So, carry on.

- Melanie

pgsql-hackers by date:

From: Vik Fearing
Date: 22 October 2024, 16:12:43
Subject: Re: Row pattern recognition

From: "David G. Johnston"
Date: 22 October 2024, 17:19:41
Subject: Re: Row pattern recognition

Re: Using read_stream in index vacuum - Mailing list pgsql-hackers

Previous

Next