Re: Parallel heap vacuum - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Parallel heap vacuum
Date
Msg-id CAAKRu_aa-bTWs5Pi6ypZzVOy+-qCJXR7Ja5zDg2oiUvjeA8yYQ@mail.gmail.com
Whole thread Raw
In response to Re: Parallel heap vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Parallel heap vacuum
List pgsql-hackers
On Sun, Mar 23, 2025 at 4:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> If we use ParallelBlockTableScanDesc with streaming read like the
> patch did, we would also need to somehow rewind the number of blocks
> allocated to workers. The problem I had with such usage was that a
> parallel vacuum worker allocated a new chunk of blocks when doing
> look-ahead reading and therefore advanced
> ParallelBlockTableScanDescData.phs_nallocated. In this case, even if
> we unpin the remaining buffers in the queue by a new functionality and
> a parallel worker resumes the phase 1 from the last processed block,
> we would lose some blocks in already allocated chunks unless we rewind
> ParallelBlockTableScanDescData and ParallelBlockTableScanWorkerData
> data. However, since a worker might have already allocated multiple
> chunks it would not be easy to rewind these scan state data.

Ah I didn't realize rewinding the state would be difficult. It seems
like the easiest way to make sure those blocks are done is to add them
back to the counter somehow. And I don't suppose there is some way to
save these not yet done block assignments somewhere and give them to
the workers who restart phase I to process on the second pass?

> Another idea is that parallel workers don't exit phase 1 until it
> consumes all pinned buffers in the queue, even if the memory usage of
> TidStore exceeds the limit. It would need to add new functionality to
> the read stream to disable the look-ahead reading. Since we could use
> much memory while processing these buffers, exceeding the memory
> limit, we can trigger this mode when the memory usage of TidStore
> reaches 70% of the limit or so. On the other hand, it means that we
> would not use the streaming read for the blocks in this mode, which is
> not efficient.

That might work. And/or maybe you could start decreasing the size of
block assignment chunks when the memory usage of TidStore reaches a
certain level. I don't know how much that would help or how fiddly it
would be.

> So we would need to
> invent a way to stop and resume the read stream in the middle during
> parallel scan.

As for needing to add new read stream functionality, we actually
probably don't have to. If you use read_stream_end() ->
read_stream_reset(), it resets the distance to 0, so then
read_stream_next_buffer() should just end up unpinning the buffers and
freeing the per buffer data. I think the easiest way to implement this
is to think about it as ending a read stream and starting a new one
next time you start phase I and not as pausing and resuming the read
stream. And anyway, maybe it's better not to keep a bunch of pinned
buffers and allocated memory hanging around while doing what could be
very long index scans.

- Melanie



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: AIO v2.5
Next
From: Andrey Borodin
Date:
Subject: Re: Using read_stream in index vacuum