Re: Parallel heap vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Parallel heap vacuum |
Date | |
Msg-id | CAD21AoBRskqtyhfZYgrw4ZJ7Va05bF+5zJu=hD9X4Pz+bDjqFA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel heap vacuum (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Parallel heap vacuum
|
List | pgsql-hackers |
On Wed, Mar 26, 2025 at 1:00 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > On Mon, Mar 24, 2025 at 7:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > You're right. I've studied the read stream code and figured out how to > > use it. In the attached patch, we end the read stream at the end of > > phase 1 and start a new read stream, as you suggested. > > I've started looking at this patch set some more. Thank you for reviewing the patch! > > In heap_vac_scan_next_block() if we are in the SKIP_PAGES_THRESHOLD > codepath and run out of unskippable blocks in the current chunk and > then go back to get another chunk (goto retry) but we are near the > memory limit so we can't get another block > (!dead_items_check_memory_limit()), could we get an infinite loop? > > Or even incorrectly encroach on another worker's block? Asking that > because of this math > end_block = next_block + > > vacrel->plvstate->scanworker->pbscanwork.phsw_chunk_remaining + 1; You're right. We should make sure that reset next_block is reset to InvalidBlockNumber at the beginning of the retry loop. > > if vacrel->plvstate->scanworker->pbscanwork.phsw_chunk_remaining is 0 > and we are in the goto retry loop, it seems like we could keep > incrementing next_block even when we shouldn't be. Right. Will fix. > > I just want to make sure that the skip pages optimization works with > the parallel block assignment and the low memory read stream > wind-down. > > I also think you do not need to expose > table_block_parallelscan_skip_pages_in_chunk() in the table AM. It is > only called in heap-specific code and the logic seems very > heap-related. If table AMs want something to skip some blocks, they > could easily implement it. Agreed. Will remove it. > > On another topic, I think it would be good to have a comment above > this code in parallel_lazy_scan_gather_scan_results(), stating why we > are very sure it is correct. > Assert(TransactionIdIsValid(data->NewRelfrozenXid)); > Assert(MultiXactIdIsValid(data->NewRelminMxid)); > > if (TransactionIdPrecedes(data->NewRelfrozenXid, > vacrel->scan_data->NewRelfrozenXid)) > vacrel->scan_data->NewRelfrozenXid = data->NewRelfrozenXid; > > if (MultiXactIdPrecedesOrEquals(data->NewRelminMxid, > vacrel->scan_data->NewRelminMxid)) > vacrel->scan_data->NewRelminMxid = data->NewRelminMxid; > > if (data->nonempty_pages < vacrel->scan_data->nonempty_pages) > vacrel->scan_data->nonempty_pages = data->nonempty_pages; > > vacrel->scan_data->skippedallvis |= data->skippedallvis; > > Parallel relfrozenxid advancement sounds scary, and scary things are > best with comments. Even though the way this works is intuitive, I > think it is worth pointing out that this part is important to get > right so future programmers know how important it is. > > One thing I was wondering about is if there are any implications of > different workers having different values in their GlobalVisState. > GlobalVisState can be updated during vacuum, so even if they start out > with the same values, that could diverge. It is probably okay since it > just controls what tuples are removable. Some workers may remove fewer > tuples than they absolutely could, and this is probably okay. > Good point. > And if it is okay for each worker to have different GlobalVisState > then maybe you shouldn't have a GlobalVisState in shared memory. If > you look at GlobalVisTestFor() it just returns the memory address of > that global variable in the backend. So, it seems like it might be > better to just let each parallel worker use their own backend local > GlobalVisState and not try to put it in shared memory and copy it from > one worker to the other workers when initializing them. I'm not sure. > At the very least, there should be a comment explaining why you've > done it the way you have done it. Agreed. IIUC it's not a problem even if parallel workers use their own GlobalVisState. I'll make that change and remove the 0004 patch which exposes GlobalVisState. I'll send the updated patch soon. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: