Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id CA+fd4k5Gr94EWr+wGRgjiKMk7kgmsnZ2y7d5+Vp+Sxb2bXuA_g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, 26 Nov 2019 at 13:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 25, 2019 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 2.
> > lazy_parallel_vacuum_or_cleanup_indexes()
> > {
> > ..
> > ..
> > }
> >
> > Here, it seems that we can increment/decrement the
> > VacuumActiveNWorkers even when there is no work performed by the
> > leader backend.  How about moving increment/decrement inside function
> > vacuum_or_cleanup_indexes_worker?  In that case, we need to do it in
> > this function when we are actually doing an index vacuum or cleanup.
> > After doing that the other usage of increment/decrement of
> > VacuumActiveNWorkers in other function heap_parallel_vacuum_main can
> > be removed.

Yeah we can move it inside vacuum_or_cleanup_indexes_worker but we
still need to increment the count before processing the indexes that
have skipped parallel operations because some workers might still be
running yet.

> >
>
> One of my colleague Mahendra who was testing this patch found that
> stats for index reported by view pg_statio_all_tables are wrong for
> parallel vacuum.  I debugged the issue and found that there were two
> problems in the stats related code.
> 1. The function get_indstats seem to be computing the wrong value of
> stats for the last index.
> 2. The function lazy_parallel_vacuum_or_cleanup_indexes() was not
> pointing to the computed stats when the parallel index scan is
> skipped.
>
> Find the above two fixes in the attached patch.  This is on top of the
> patches I sent yesterday [1].

Thank you! During testing the current patch by myself I also found this bug.

>
> Some more comments on v33-0002-Add-parallel-option-to-VACUUM-command
> -------------------------------------------------------------------------------------------------------------
> 1.  The code in function lazy_parallel_vacuum_or_cleanup_indexes()
> that processes the indexes that have skipped parallel processing can
> be moved to a separate function.  Further, the newly added code by the
> attached patch can also be moved to a separate function as the same
> code is used in function vacuum_or_cleanup_indexes_worker().
>
> 2.
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
> {
> ..
> + stats = (IndexBulkDeleteResult **)
> + palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
> ..
> }
>
> It would be neat if we free this memory once it is used.
>
> 3.
> + /*
> + * Compute the number of indexes that can participate to parallel index
> + * vacuuming.
> + */
>
> /to/in
>
> 4.  The function lazy_parallel_vacuum_or_cleanup_indexes() launches
> workers without checking whether it needs to do the same or not.  For
> ex. in cleanup phase, it is possible that we don't need to launch any
> worker, so it will be waste.  It might be that you are already
> planning to handle it based on the previous comments/discussion in
> which case you can ignore this.

I've incorporated the comments I got so far including the above and
the memory alignment issue. Therefore the attached v34 patch includes
that changes and changes in v33-0002-delta-amit.patch and
v33-0002-delta2-fix-stats-issue.patch. In this version I add an extra
argument to LaunchParallelWorkers function and make the leader process
launch the parallel workers as much as the particular phase needs.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: pglz performance
Next
From: Tomas Vondra
Date:
Subject: Re: pglz performance