Re: Parallel heap vacuum - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Parallel heap vacuum |
Date | |
Msg-id | CAFiTN-uz-5xF=2ZG2DJDs0EmnKURyRAvmnTaAfT7ihN4n6TdYA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel heap vacuum (Dilip Kumar <dilipbalaut@gmail.com>) |
List | pgsql-hackers |
On Wed, Mar 12, 2025 at 11:24 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Mar 12, 2025 at 3:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Some more specific comments on the patch set -- v11-0001 1. This introduces functions like parallel_vacuum_estimate(), parallel_vacuum_initialize(), etc., but these functions haven't been assigned to the respective members in TableAMRoutine. As shown in the hunk below, only parallel_vacuum_compute_workers is initialized, while parallel_vacuum_estimate and the others are not. These are assigned in later patches, which makes the patch division appear a bit unclean. +++ b/src/backend/access/heap/heapam_handler.c @@ -2688,7 +2688,9 @@ static const TableAmRoutine heapam_methods = { .scan_bitmap_next_block = heapam_scan_bitmap_next_block, .scan_bitmap_next_tuple = heapam_scan_bitmap_next_tuple, .scan_sample_next_block = heapam_scan_sample_next_block, - .scan_sample_next_tuple = heapam_scan_sample_next_tuple + .scan_sample_next_tuple = heapam_scan_sample_next_tuple, + + .parallel_vacuum_compute_workers = heap_parallel_vacuum_compute_workers, }; -- v11-0002 2. In commit message, you mentioned a function name "parallel_vacuum_remove_dead_items_begin()" but there is no such function introduced in the patch, I think you meant 'parallel_vacuum_collect_dead_items_begin()'? 3. I would suggest rewrite for this /launched for the table vacuuming and index processing./launched for the table pruning phase and index vacuuming. 4. These comments are talking about DSA and DSM but we better clarify for what we are using DSM and DSA, or give reference to the location where we have explained that. + * In a parallel vacuum, we perform table scan or both index bulk deletion and + * index cleanup or all of them with parallel worker processes. Different + * numbers of workers are launched for the table vacuuming and index processing. + * ParallelVacuumState contains shared information as well as the memory space + * for storing dead items allocated in the DSA area. + * + * When initializing parallel table vacuum scan, we invoke table AM routines for + * estimating DSM sizes and initializing DSM memory. Parallel table vacuum + * workers invoke the table AM routine for vacuuming the table. 5. Is there any particular reason why marking the dead TID as reusable is not done in parallel? Is it because parallelizing that phase wouldn't make sense due to the work involved, or is there another reason? +/* The kind of parallel vacuum phases */ +typedef enum +{ + PV_WORK_PHASE_PROCESS_INDEXES, /* index vacuuming or cleanup */ + PV_WORK_PHASE_COLLECT_DEAD_ITEMS, /* collect dead tuples */ +} PVWorkPhase; 6. In the below hunk "nrequested_workers is >= 0 number and the requested parallel degree." sentence doesn't make sense to me, can you rephrase this? + * nrequested_workers is >= 0 number and the requested parallel degree. 0 + * means that the parallel degrees for table and indexes vacuum are decided + * differently. See the comments of parallel_vacuum_compute_workers() for + * details. Thats what I got so far, I will continue reviewing 0002 and the remaining patches from the thread. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: