Re: Parallel heap vacuum - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Parallel heap vacuum |
Date | |
Msg-id | CAAKRu_aa-bTWs5Pi6ypZzVOy+-qCJXR7Ja5zDg2oiUvjeA8yYQ@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel heap vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Parallel heap vacuum
|
List | pgsql-hackers |
On Sun, Mar 23, 2025 at 4:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > If we use ParallelBlockTableScanDesc with streaming read like the > patch did, we would also need to somehow rewind the number of blocks > allocated to workers. The problem I had with such usage was that a > parallel vacuum worker allocated a new chunk of blocks when doing > look-ahead reading and therefore advanced > ParallelBlockTableScanDescData.phs_nallocated. In this case, even if > we unpin the remaining buffers in the queue by a new functionality and > a parallel worker resumes the phase 1 from the last processed block, > we would lose some blocks in already allocated chunks unless we rewind > ParallelBlockTableScanDescData and ParallelBlockTableScanWorkerData > data. However, since a worker might have already allocated multiple > chunks it would not be easy to rewind these scan state data. Ah I didn't realize rewinding the state would be difficult. It seems like the easiest way to make sure those blocks are done is to add them back to the counter somehow. And I don't suppose there is some way to save these not yet done block assignments somewhere and give them to the workers who restart phase I to process on the second pass? > Another idea is that parallel workers don't exit phase 1 until it > consumes all pinned buffers in the queue, even if the memory usage of > TidStore exceeds the limit. It would need to add new functionality to > the read stream to disable the look-ahead reading. Since we could use > much memory while processing these buffers, exceeding the memory > limit, we can trigger this mode when the memory usage of TidStore > reaches 70% of the limit or so. On the other hand, it means that we > would not use the streaming read for the blocks in this mode, which is > not efficient. That might work. And/or maybe you could start decreasing the size of block assignment chunks when the memory usage of TidStore reaches a certain level. I don't know how much that would help or how fiddly it would be. > So we would need to > invent a way to stop and resume the read stream in the middle during > parallel scan. As for needing to add new read stream functionality, we actually probably don't have to. If you use read_stream_end() -> read_stream_reset(), it resets the distance to 0, so then read_stream_next_buffer() should just end up unpinning the buffers and freeing the per buffer data. I think the easiest way to implement this is to think about it as ending a read stream and starting a new one next time you start phase I and not as pausing and resuming the read stream. And anyway, maybe it's better not to keep a bunch of pinned buffers and allocated memory hanging around while doing what could be very long index scans. - Melanie
pgsql-hackers by date: