Re: Parallel heap vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Parallel heap vacuum |
Date | |
Msg-id | CAD21AoAAF52NDLOWU5ZA5=5rWcVFsLnUnU+O6KQzLH8V-tc20g@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel heap vacuum (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: Parallel heap vacuum
Re: Parallel heap vacuum Re: Parallel heap vacuum Re: Parallel heap vacuum Re: Parallel heap vacuum |
List | pgsql-hackers |
On Thu, Mar 6, 2025 at 10:56 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Sawada-San. Here are some review comments for patch v10-0002. Thank you for reviewing the patch. > > ====== > src/backend/access/heap/heapam_handler.c > > 1. > .scan_bitmap_next_block = heapam_scan_bitmap_next_block, > .scan_bitmap_next_tuple = heapam_scan_bitmap_next_tuple, > .scan_sample_next_block = heapam_scan_sample_next_block, > - .scan_sample_next_tuple = heapam_scan_sample_next_tuple > + .scan_sample_next_tuple = heapam_scan_sample_next_tuple, > + > + .parallel_vacuum_compute_workers = heap_parallel_vacuum_compute_workers, > > Pretty much every other heam AM method here has a 'heapam_' prefix. Is > it better to be consistent and call the new function > 'heapam_parallel_vacuum_compute_workers' instead of > 'heap_parallel_vacuum_compute_workers'? Hmm, given that the existing vacuum-related callback, heap_vacuum_rel, uses 'heap' I guess it's okay with 'heap_' prefix for parallel vacuum callbacks. > > ====== > src/backend/access/heap/vacuumlazy.c > > 2. > +/* > + * Compute the number of workers for parallel heap vacuum. > + * > + * Disabled so far. > + */ > +int > +heap_parallel_vacuum_compute_workers(Relation rel, int max_workers) > +{ > + return 0; > +} > + > > Instead of saying "Disabled so far", the function comment maybe should say: > "Return 0 means parallel heap vacuum is disabled." > > Then the comment doesn't need to churn later when the function gets > implemented in later patches. Updated. > > ====== > src/backend/commands/vacuumparallel.c > > 3. > +/* The kind of parallel vacuum work */ > +typedef enum > +{ > + PV_WORK_ITEM_PROCESS_INDEXES, /* index vacuuming or cleanup */ > + PV_WORK_ITEM_COLLECT_DEAD_ITEMS, /* collect dead tuples */ > +} PVWorkItem; > + > > Isn't this more like a PVWorkPhase instead of PVWorkItem? Ditto for > the field name: 'work_phase' seems more appropriate. Agreed. > > ~~~ > > 4. > + /* > + * Processing indexes or removing dead tuples from the table. > + */ > + PVWorkItem work_item; > > Missing question mark for this comment? Updated. > > ~~~ > > 5. > + /* > + * The number of workers for parallel table vacuuming. If > 0, the > + * parallel table vacuum is enabled. > + */ > + int nworkers_for_table; > + > > I guess this field will never be negative. So is it simpler to modify > the comment to say: > "If 0, parallel table vacuum is disabled." Agreed. > > ~~~ > > parallel_vacuum_init: > > 6. > + /* A parallel vacuum must be requested */ > Assert(nrequested_workers >= 0); > > It's not very intuitive to say the user requested a parallel vacuum > when the 'nrequested_workers' is 0. I felt some more commentary is > needed here or in the function header; it seems nrequested_workers == > 0 has a special meaning of having the system decide the number of > workers. Added some comments. > > ~~~ > > parallel_vacuum_compute_workers: > > 7. > static int > -parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int > nrequested, > +parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes, > + int nrequested, int *nworkers_table_p, > bool *will_parallel_vacuum) > { > int nindexes_parallel = 0; > int nindexes_parallel_bulkdel = 0; > int nindexes_parallel_cleanup = 0; > - int parallel_workers; > + int nworkers_table = 0; > + int nworkers_index = 0; > + > + *nworkers_table_p = 0; > > 7a. > AFAICT you can just remove the variable 'nworkers_table', and instead > call the 'nworkers_table_p' parameter as 'nworkers_table' Yes, but I wanted to not modify the output parameter many times. > > ~ > > 7b. > IIUC this 'will_parallel_vacuum' only has meaning for indexes, but now > that the patch introduces parallel table vacuuming it makes this > existing 'will_parallel_vacuum' name generic/misleading. Maybe it now > needs to be called 'will_parallel_index_vacuum' or similar (in all > places). Agreed. > > ~~~ > > 8. > + if (nrequested > 0) > + { > + /* > + * If the parallel degree is specified, accept that as the number of > + * parallel workers for table vacuum (though still cap at > + * max_parallel_maintenance_workers). > + */ > + nworkers_table = Min(nrequested, max_parallel_maintenance_workers); > + } > + else > + { > + /* Compute the number of workers for parallel table scan */ > + nworkers_table = table_parallel_vacuum_compute_workers(rel, > + max_parallel_maintenance_workers); > + > + Assert(nworkers_table <= max_parallel_maintenance_workers); > + } > + > > The Assert can be outside of the if/else because it is the same for both. This part has been removed by modifying the parallel degree computation logic. > > ~~~ > > 9. > /* No index supports parallel vacuum */ > - if (nindexes_parallel <= 0) > - return 0; > - > - /* Compute the parallel degree */ > - parallel_workers = (nrequested > 0) ? > - Min(nrequested, nindexes_parallel) : nindexes_parallel; > + if (nindexes_parallel > 0) > + { > + /* Take into account the requested number of workers */ > + nworkers_index = (nrequested > 0) ? > + Min(nrequested, nindexes_parallel) : nindexes_parallel; > > - /* Cap by max_parallel_maintenance_workers */ > - parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers); > + /* Cap by max_parallel_maintenance_workers */ > + nworkers_index = Min(nworkers_index, max_parallel_maintenance_workers); > + } > > "No index supports..." seems to be an old comment that is not correct > for this new code block. Removed. > > ~~~ > > 10. > + pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1); > +} > + > + > > Double blank line after function 'parallel_vacuum_process_table' Removed. > > ~~~ > > main: > > 11. > + /* Initialize AM-specific vacuum state for parallel table vacuuming */ > + if (shared->work_item == PV_WORK_ITEM_COLLECT_DEAD_ITEMS) > + { > + ParallelWorkerContext pwcxt; > + > + pwcxt.toc = toc; > + pwcxt.seg = seg; > + table_parallel_vacuum_initialize_worker(rel, &pvs, &pwcxt, > + &state); > + } > + > > Wondering if this code can be done in the same if block later: "if > (pvs.shared->work_item == PV_WORK_ITEM_COLLECT_DEAD_ITEMS)" If we do that, we would end up calling InstrStartParallelQuery() before the worker initialization. I guess we want to avoid counting these usage during the initialization. > > ~~~ > > 12. > + if (pvs.shared->work_item == PV_WORK_ITEM_COLLECT_DEAD_ITEMS) > + { > + /* Scan the table to collect dead items */ > + parallel_vacuum_process_table(&pvs, state); > + } > + else > + { > + Assert(pvs.shared->work_item == PV_WORK_ITEM_PROCESS_INDEXES); > + > + /* Process indexes to perform vacuum/cleanup */ > + parallel_vacuum_process_safe_indexes(&pvs); > + } > > Would this if/else be better implemented as a 'switch' for the possible phases? > Okay, changed. I've attached the updated version patches. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
- v11-0001-Introduces-table-AM-APIs-for-parallel-table-vacu.patch
- v11-0004-Move-GlobalVisState-definition-to-snapmgr_intern.patch
- v11-0005-Support-parallelism-for-collecting-dead-items-du.patch
- v11-0003-Move-lazy-heap-scan-related-variables-to-new-str.patch
- v11-0002-vacuumparallel.c-Support-parallel-table-vacuumin.patch
pgsql-hackers by date: