Re: Parallel heap vacuum - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Parallel heap vacuum |
Date | |
Msg-id | CAAKRu_aS_k751daj0QfO+5ZgLDz8LP1Wo_b5=RRKt6=f6OStLg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel heap vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Parallel heap vacuum
|
List | pgsql-hackers |
On Mon, Mar 24, 2025 at 7:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > You're right. I've studied the read stream code and figured out how to > use it. In the attached patch, we end the read stream at the end of > phase 1 and start a new read stream, as you suggested. I've started looking at this patch set some more. In heap_vac_scan_next_block() if we are in the SKIP_PAGES_THRESHOLD codepath and run out of unskippable blocks in the current chunk and then go back to get another chunk (goto retry) but we are near the memory limit so we can't get another block (!dead_items_check_memory_limit()), could we get an infinite loop? Or even incorrectly encroach on another worker's block? Asking that because of this math end_block = next_block + vacrel->plvstate->scanworker->pbscanwork.phsw_chunk_remaining + 1; if vacrel->plvstate->scanworker->pbscanwork.phsw_chunk_remaining is 0 and we are in the goto retry loop, it seems like we could keep incrementing next_block even when we shouldn't be. I just want to make sure that the skip pages optimization works with the parallel block assignment and the low memory read stream wind-down. I also think you do not need to expose table_block_parallelscan_skip_pages_in_chunk() in the table AM. It is only called in heap-specific code and the logic seems very heap-related. If table AMs want something to skip some blocks, they could easily implement it. On another topic, I think it would be good to have a comment above this code in parallel_lazy_scan_gather_scan_results(), stating why we are very sure it is correct. Assert(TransactionIdIsValid(data->NewRelfrozenXid)); Assert(MultiXactIdIsValid(data->NewRelminMxid)); if (TransactionIdPrecedes(data->NewRelfrozenXid, vacrel->scan_data->NewRelfrozenXid)) vacrel->scan_data->NewRelfrozenXid = data->NewRelfrozenXid; if (MultiXactIdPrecedesOrEquals(data->NewRelminMxid, vacrel->scan_data->NewRelminMxid)) vacrel->scan_data->NewRelminMxid = data->NewRelminMxid; if (data->nonempty_pages < vacrel->scan_data->nonempty_pages) vacrel->scan_data->nonempty_pages = data->nonempty_pages; vacrel->scan_data->skippedallvis |= data->skippedallvis; Parallel relfrozenxid advancement sounds scary, and scary things are best with comments. Even though the way this works is intuitive, I think it is worth pointing out that this part is important to get right so future programmers know how important it is. One thing I was wondering about is if there are any implications of different workers having different values in their GlobalVisState. GlobalVisState can be updated during vacuum, so even if they start out with the same values, that could diverge. It is probably okay since it just controls what tuples are removable. Some workers may remove fewer tuples than they absolutely could, and this is probably okay. And if it is okay for each worker to have different GlobalVisState then maybe you shouldn't have a GlobalVisState in shared memory. If you look at GlobalVisTestFor() it just returns the memory address of that global variable in the backend. So, it seems like it might be better to just let each parallel worker use their own backend local GlobalVisState and not try to put it in shared memory and copy it from one worker to the other workers when initializing them. I'm not sure. At the very least, there should be a comment explaining why you've done it the way you have done it. - Melanie
pgsql-hackers by date: