Re: Parallel heap vacuum - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Parallel heap vacuum |
Date | |
Msg-id | CAHut+PuxF2LbHT-aYPd3SA7wi1S0tk6D6c40AJW9ONv9J1GF2Q@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel heap vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
Hi Sawada-San. Here are some review comments for patch v10-0002. ====== src/backend/access/heap/heapam_handler.c 1. .scan_bitmap_next_block = heapam_scan_bitmap_next_block, .scan_bitmap_next_tuple = heapam_scan_bitmap_next_tuple, .scan_sample_next_block = heapam_scan_sample_next_block, - .scan_sample_next_tuple = heapam_scan_sample_next_tuple + .scan_sample_next_tuple = heapam_scan_sample_next_tuple, + + .parallel_vacuum_compute_workers = heap_parallel_vacuum_compute_workers, Pretty much every other heam AM method here has a 'heapam_' prefix. Is it better to be consistent and call the new function 'heapam_parallel_vacuum_compute_workers' instead of 'heap_parallel_vacuum_compute_workers'? ====== src/backend/access/heap/vacuumlazy.c 2. +/* + * Compute the number of workers for parallel heap vacuum. + * + * Disabled so far. + */ +int +heap_parallel_vacuum_compute_workers(Relation rel, int max_workers) +{ + return 0; +} + Instead of saying "Disabled so far", the function comment maybe should say: "Return 0 means parallel heap vacuum is disabled." Then the comment doesn't need to churn later when the function gets implemented in later patches. ====== src/backend/commands/vacuumparallel.c 3. +/* The kind of parallel vacuum work */ +typedef enum +{ + PV_WORK_ITEM_PROCESS_INDEXES, /* index vacuuming or cleanup */ + PV_WORK_ITEM_COLLECT_DEAD_ITEMS, /* collect dead tuples */ +} PVWorkItem; + Isn't this more like a PVWorkPhase instead of PVWorkItem? Ditto for the field name: 'work_phase' seems more appropriate. ~~~ 4. + /* + * Processing indexes or removing dead tuples from the table. + */ + PVWorkItem work_item; Missing question mark for this comment? ~~~ 5. + /* + * The number of workers for parallel table vacuuming. If > 0, the + * parallel table vacuum is enabled. + */ + int nworkers_for_table; + I guess this field will never be negative. So is it simpler to modify the comment to say: "If 0, parallel table vacuum is disabled." ~~~ parallel_vacuum_init: 6. + /* A parallel vacuum must be requested */ Assert(nrequested_workers >= 0); It's not very intuitive to say the user requested a parallel vacuum when the 'nrequested_workers' is 0. I felt some more commentary is needed here or in the function header; it seems nrequested_workers == 0 has a special meaning of having the system decide the number of workers. ~~~ parallel_vacuum_compute_workers: 7. static int -parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested, +parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes, + int nrequested, int *nworkers_table_p, bool *will_parallel_vacuum) { int nindexes_parallel = 0; int nindexes_parallel_bulkdel = 0; int nindexes_parallel_cleanup = 0; - int parallel_workers; + int nworkers_table = 0; + int nworkers_index = 0; + + *nworkers_table_p = 0; 7a. AFAICT you can just remove the variable 'nworkers_table', and instead call the 'nworkers_table_p' parameter as 'nworkers_table' ~ 7b. IIUC this 'will_parallel_vacuum' only has meaning for indexes, but now that the patch introduces parallel table vacuuming it makes this existing 'will_parallel_vacuum' name generic/misleading. Maybe it now needs to be called 'will_parallel_index_vacuum' or similar (in all places). ~~~ 8. + if (nrequested > 0) + { + /* + * If the parallel degree is specified, accept that as the number of + * parallel workers for table vacuum (though still cap at + * max_parallel_maintenance_workers). + */ + nworkers_table = Min(nrequested, max_parallel_maintenance_workers); + } + else + { + /* Compute the number of workers for parallel table scan */ + nworkers_table = table_parallel_vacuum_compute_workers(rel, + max_parallel_maintenance_workers); + + Assert(nworkers_table <= max_parallel_maintenance_workers); + } + The Assert can be outside of the if/else because it is the same for both. ~~~ 9. /* No index supports parallel vacuum */ - if (nindexes_parallel <= 0) - return 0; - - /* Compute the parallel degree */ - parallel_workers = (nrequested > 0) ? - Min(nrequested, nindexes_parallel) : nindexes_parallel; + if (nindexes_parallel > 0) + { + /* Take into account the requested number of workers */ + nworkers_index = (nrequested > 0) ? + Min(nrequested, nindexes_parallel) : nindexes_parallel; - /* Cap by max_parallel_maintenance_workers */ - parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers); + /* Cap by max_parallel_maintenance_workers */ + nworkers_index = Min(nworkers_index, max_parallel_maintenance_workers); + } "No index supports..." seems to be an old comment that is not correct for this new code block. ~~~ 10. + pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1); +} + + Double blank line after function 'parallel_vacuum_process_table' ~~~ main: 11. + /* Initialize AM-specific vacuum state for parallel table vacuuming */ + if (shared->work_item == PV_WORK_ITEM_COLLECT_DEAD_ITEMS) + { + ParallelWorkerContext pwcxt; + + pwcxt.toc = toc; + pwcxt.seg = seg; + table_parallel_vacuum_initialize_worker(rel, &pvs, &pwcxt, + &state); + } + Wondering if this code can be done in the same if block later: "if (pvs.shared->work_item == PV_WORK_ITEM_COLLECT_DEAD_ITEMS)" ~~~ 12. + if (pvs.shared->work_item == PV_WORK_ITEM_COLLECT_DEAD_ITEMS) + { + /* Scan the table to collect dead items */ + parallel_vacuum_process_table(&pvs, state); + } + else + { + Assert(pvs.shared->work_item == PV_WORK_ITEM_PROCESS_INDEXES); + + /* Process indexes to perform vacuum/cleanup */ + parallel_vacuum_process_safe_indexes(&pvs); + } Would this if/else be better implemented as a 'switch' for the possible phases? ====== Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: