Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id 20191227022401.ikepifssdes6apaw@development
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses Re: [HACKERS] Block level parallel vacuum
List pgsql-hackers
Hi,

On Wed, Dec 25, 2019 at 09:17:16PM +0900, Masahiko Sawada wrote:
>On Tue, 24 Dec 2019 at 15:46, Masahiko Sawada
><masahiko.sawada@2ndquadrant.com> wrote:
>>
>> On Tue, 24 Dec 2019 at 15:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> > On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada
>> > <masahiko.sawada@2ndquadrant.com> wrote:
>> > >
>> > >
>> > > The first patches look good to me. I'm reviewing other patches and
>> > > will post comments if there is.
>> > >
>>
>> Oops I meant first "two" patches look good to me.
>>
>> >
>> > Okay, feel free to address few comments raised by Mahendra along with
>> > whatever you find.
>>
>> Thanks!
>>
>
>I've attached updated patch set as the previous version patch set
>conflicts to the current HEAD. This patch set incorporated the review
>comments, a few fix and the patch for
>PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. 0001 patch is the same
>as previous version.
>

I've been reviewing the updated patches over the past couple of days, so
let me share some initial review comments. I initially started to read
the thread, but then I realized it's futile - the thread is massive, and
the patch changed so much re-reading the whole thread is a waste of time.

It might be useful write a summary of the current design, but AFAICS the
original plan to parallelize the heap scan is abandoned and we now do
just the steps that vacuum indexes in parallel. Which is fine, but it
means the subject "block level parallel vacuum" is a bit misleading.

Anyway, most of the logic is implemented in part 0002, which actually
does all the parallel worker stuff. The remaining parts 0001, 0003 and
0004 are either preparing infrastructure or not directlyrelated to the
primary feature.


v40-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch
-----------------------------------------------------------

I wonder if 'amparallelvacuumoptions' is unnecessarily specific. Maybe
it should be called just 'amvacuumoptions' or something like that? The
'parallel' part is actually encoded in names of the options.

Also, why do we need a separate amusemaintenanceworkmem option? Why
don't we simply track it using a separate flag in 'amvacuumoptions'
(or whatever we end up calling it)?

Would it make sense to track m_w_m usage separately for the two index
cleanup phases? Or is that unnecessary / pointless?


v40-0002-Add-a-parallel-option-to-the-VACUUM-command.patch
----------------------------------------------------------

I haven't found any issues yet, but I've only started with the code
review. I'll continue with the review. It seems in a fairly good shape
though, I think, I only have two minor comments at the moment:

- The SizeOfLVDeadTuples macro seems rather weird. It does include space
   for one ItemPointerData, but we really need an array of them. But then
   all the places where the macro is used explicitly add space for the
   pointers, so the sizeof(ItemPointerData) seems unnecessary. So it
   should be either

#define SizeOfLVDeadTuples (offsetof(LVDeadTuples, itemptrs))

   or 

#define SizeOfLVDeadTuples(cnt) \
   (offsetof(LVDeadTuples, itemptrs) + (cnt) * sizeof(ItemPointerData))

   in which case the callers can be simplified.

- It's not quite clear to me why we need the new nworkers_to_launch
   field in ParallelContext.


v40-0003-Add-FAST-option-to-vacuum-command.patch
------------------------------------------------

I do have a bit of an issue with this part - I'm not quite convinved we
actually need a FAST option, and I actually suspect we'll come to regret
it sooner than later. AFAIK it pretty much does exactly the same thing
as setting vacuum_cost_delay to 0, and IMO it's confusing to provide
multiple ways to do the same thing - I do expect reports from confused
users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a
sufficient solution?

The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do
we need a separate VACUUM option, instead of just using the existing
max_parallel_maintenance_workers GUC? It's good enough for CREATE INDEX
so why not here?

Maybe it's explained somewhere deep in the thread, of course ...


v40-0004-Add-ability-to-disable-leader-participation-in-p.patch
---------------------------------------------------------------

IMHO this should be simply merged into 0002.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Nikita Glukhov
Date:
Subject: Re: Avoid full GIN index scan when possible
Next
From: Justin Pryzby
Date:
Subject: doc: vacuum full, fillfactor, and "extra space"