Re: Parallel heap vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Parallel heap vacuum |
Date | |
Msg-id | CAD21AoDSU88Z1dHUOgLSHks=F9EPaH41PkWpzM-cTz3Xnxpyig@mail.gmail.com Whole thread Raw |
In response to | RE: Parallel heap vacuum ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
List | pgsql-hackers |
On Fri, Apr 18, 2025 at 2:49 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Sawada-san, > > Thanks for updating the patch. I have been reviewing and below are comments for now. Thank you for reviewing the patch! > > 01. > Sorry if I forgot something, but is there a reason that > parallel_vacuum_compute_workers() is mandatory? My fresh eye felt that the function > can check and regards zero if the API is NULL. I thought that making this function mandatory would be a good way for table AM authors to indicate their intentions in terms of parallel vacuum. On the other hand, I see your point; it would not bother table AMs that are not going to support parallel vacuum. I'd like to hear further opinions on this. > > 02. table_parallel_vacuum_estimate > We can assert that state is not NULL. I think the state can be NULL depending on the caller. It should not be mandatory. > > 03. table_parallel_vacuum_initialize > > Same as 02.. Also, I think we should ensure here that table_parallel_vacuum_estimate() > has already been called before, and current assertion might not enough because > others can set random value here. Other functions like table_rescan_tidrange() > checks the internal flag of the data structure, which is good for me. How do you > feel to check pcxt->estimator has non-zero value? > > 04. table_parallel_vacuum_initialize_worker > Same as 02. Also, can we esure that table_parallel_vacuum_initialize() has been > done? > > 05. table_parallel_vacuum_collect_dead_items > Same as 02. Also, can we esure that table_parallel_vacuum_initialize_worker() > has been done? IIUC table_parallel_vacuum_initialize() is called only by vacuumparallel.c. which is not controllable outside of it. How can others set a random value to nworkers? > > 06. table_parallel_vacuum_initialize_worker > Comments atop function needs to be updated. Did you refer to the following comment? /* * Initialize AM-specific vacuum state for worker processes. */ static inline void table_parallel_vacuum_initialize_worker(Relation rel, struct ParallelVacuumState *pvs, struct ParallelWorkerContext *pwcxt, void **state_out) Which part of the comment do you think we need to update? > > 07. > While playing, I found that the parallel vacuum worker can be launched more than > pages: > > ``` > postgres=# CREATE TABLE var (id int); > CREATE TABLE > postgres=# INSERT INTO var VALUES (generate_series(1, 10000)); > INSERT 0 10000 > postgres=# VACUUM (parallel 80, verbose) var ; > INFO: vacuuming "postgres.public.var" > INFO: launched 80 parallel vacuum workers for collecting dead tuples (planned: 80) > INFO: finished vacuuming "postgres.public.var": index scans: 0 > pages: 0 removed, 45 remain, 45 scanned (100.00% of total), 0 eagerly scanned > ... > ``` > > I hope the minimum chunk size is a page so that this situation can reduce the performance. > How do you feel to cap the value with rel::rd_rel::relpages in heap_parallel_vacuum_compute_workers()? > This value is not always up-to-date but seems good candidate. Thank you for testing the patch. Yes, we should avoid that. I think it would be better to cap the number of workers based on the number of chunks of blocks. I'm going to propose a new parallel scan method for parallel lazy scan so it would make sense to choose a saner value based on it. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: