Re: Parallel heap vacuum - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Parallel heap vacuum |
Date | |
Msg-id | CAHut+PtnyLVkgg7BsfXy0ciVeyCBaXNRSSi0h8AVdx9cTL9_ug@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel heap vacuum (Peter Smith <smithpb2250@gmail.com>) |
List | pgsql-hackers |
Hi Sawada-San. FWIW, here are the remainder of my review comments for the patch v4-0001 ====== src/backend/access/heap/vacuumlazy.c lazy_scan_heap: 2.1. + /* + * Do the actual work. If parallel heap vacuum is active, we scan and + * vacuum heap with parallel workers. + */ /with/using/ ~~~ 2.2. + if (ParallelHeapVacuumIsActive(vacrel)) + do_parallel_lazy_scan_heap(vacrel); + else + do_lazy_scan_heap(vacrel); The do_lazy_scan_heap() returns a boolean and according to that function comment it should always be true if it is not using the parallel heap scan. So should we get the function return value here and Assert that it is true? ~~~ 2.3. Start uppercase even for all the single line comments for consistency with exasiting code. e.g. + /* report that everything is now scanned */ e.g + /* now we can compute the new value for pg_class.reltuples */ e.g. + /* report all blocks vacuumed */ ~~~ heap_vac_scan_next_block_parallel: 2.4. +/* + * A parallel scan variant of heap_vac_scan_next_block. + * + * In parallel vacuum scan, we don't use the SKIP_PAGES_THRESHOLD optimization. + */ +static bool +heap_vac_scan_next_block_parallel(LVRelState *vacrel, BlockNumber *blkno, + bool *all_visible_according_to_vm) The function comment should explain the return value. ~~~ 2.5. + if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0) + { + + if (vacrel->aggressive) + break; Unnecessary whitespace. ~~~ dead_items_alloc: 2.6. + /* + * We initialize parallel heap scan/vacuuming or index vacuuming + * or both based on the table size and the number of indexes. Note + * that only one worker can be used for an index, we invoke + * parallelism for index vacuuming only if there are at least two + * indexes on a table. + */ vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels, vacrel->nindexes, nworkers, vac_work_mem, vacrel->verbose ? INFO : DEBUG2, - vacrel->bstrategy); + vacrel->bstrategy, (void *) vacrel); Is this information misplaced? Why describe here "only one worker" and "at least two indexes on a table" I don't see anything here checking those conditions. ~~~ heap_parallel_vacuum_compute_workers: 2.7. + /* + * Select the number of workers based on the log of the size of the + * relation. This probably needs to be a good deal more + * sophisticated, but we need something here for now. Note that the + * upper limit of the min_parallel_table_scan_size GUC is chosen to + * prevent overflow here. + */ The "This probably needs to..." part maybe should have an "XXX" marker in the comment which AFAIK is used to highlight current decisions and potential for future changes. ~~~ heap_parallel_vacuum_initialize: 2.8. There is inconsistent capitalization of the single-line comments in this function. The same occurs in many functions in this file. but it is just a bit more obvious in this one. Please see all the others too. ~~~ parallel_heap_complete_unfinised_scan: 2.9. +static void +parallel_heap_complete_unfinised_scan(LVRelState *vacrel) TYPO in function name /unfinised/unfinished/ ~~~ 2.10. + if (!wstate->maybe_have_blocks) + + continue; Unnecessary blank line. ~~~ 2.11. + + /* Attache the worker's scan state and do heap scan */ + vacrel->phvstate->myscanstate = wstate; + scan_done = do_lazy_scan_heap(vacrel); /Attache/Attach/ ~~~ 2.12. + /* + * We don't need to gather the scan statistics here because statistics + * have already been accumulated the leaders statistics directly. + */ "have already been accumulated the leaders" -- missing word there somewhere? ~~~ do_parallel_lazy_scan_heap: 2.13. + /* + * If the heap scan paused in the middle of the table due to full of + * dead_items TIDs, perform a round of index and heap vacuuming. + */ + if (!scan_done) + { + /* Perform a round of index and heap vacuuming */ + vacrel->consider_bypass_optimization = false; + lazy_vacuum(vacrel); + + /* + * Vacuum the Free Space Map to make newly-freed space visible on + * upper-level FSM pages. + */ + if (vacrel->phvstate->min_blkno > vacrel->next_fsm_block_to_vacuum) + { + /* + * min_blkno should have already been updated when gathering + * statistics + */ + FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum, + vacrel->phvstate->min_blkno + 1); + vacrel->next_fsm_block_to_vacuum = vacrel->phvstate->min_blkno; + } + + /* Report that we are once again scanning the heap */ + pgstat_progress_update_param(PROGRESS_VACUUM_PHASE, + PROGRESS_VACUUM_PHASE_SCAN_HEAP); + + /* re-launcher workers */ + vacrel->phvstate->nworkers_launched = + parallel_vacuum_table_scan_begin(vacrel->pvs); + + continue; + } + + /* We reach the end of the table */ + break; Instead of: if (!scan_done) { <other code ...> continue; } break; Won't it be better to refactor like: SUGGESTION if (scan_done) break; <other code...> ~~~ 2.14. + /* + * The parallel heap vacuum finished, but it's possible that some workers + * have allocated blocks but not processed yet. This can happen for + * example when workers exit because of full of dead_items TIDs and the + * leader process could launch fewer workers in the next cycle. + */ There seem to be some missing words: e.g. /not processed yet./not processed them yet./ e.g. /because of full of dead_items/because they are full of dead_items/ ====== Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: