Re: parallel vacuum comments - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: parallel vacuum comments
Date
Msg-id CAD21AoAR34jZFix3Mi-M4tnXU8k-TcooEoVtvbVD6y3Om+kmSw@mail.gmail.com
Whole thread Raw
In response to Re: parallel vacuum comments  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, Dec 13, 2021 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Dec 11, 2021 at 8:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sat, Dec 11, 2021 at 2:32 PM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > Hi,
> > >
> > > On 2021-10-30 14:21:01 -0700, Andres Freund wrote:
> > > > Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
> > > > related code. And I found a few things that I think could stand improvement:
> >
> > Thank you for the comments.
> >
> > >
> > > While working on the fix for #17255 (more specifically some cleanup that Peter
> > > suggested in the context), I noticed another thing: Initializing parallelism
> > > as part of dead_items_alloc() is a bad idea. Even if there are comments noting
> > > that oddity.
> > >
> > > I don't really see why we should do it this way? There's no "no-parallelism"
> > > path in begin_parallel_vacuum() besides compute_parallel_vacuum_workers(). So
> > > it's not like we might just discover the inability to do parallelism during
> > > parallel initialization?
> >
> > Right. Also, in parallel vacuum case, it allocates the space not only
> > for dead items but also other data required to do parallelism like
> > shared bulkdeletion results etc. Originally, in PG13,
> > begin_parallel_vacuum() was called by lazy_scan_heap() but in PG14 it
> > became part of dead_items_alloc() (see b4af70cb2). I agree to change
> > this part so that lazy_scan_heap() calls begin_parallel_vacuum()
> > (whatever we rename it). I'll incorporate this change in the
> > refactoring patch barring any objections.
> >
> > >
> > > It's also not particularly helpful to have a begin_parallel_vacuum() that
> > > might not actually begin a parallel vacuum...
> >
> > During the development, I found that we have some begin_* functions
> > that don't start the actual parallel job but prepare state data for
> > starting parallel job and referred to  _bt_begin_parallel() so I named
> > begin_parallel_vacuum(). But I admit that considering what the
> > function actually does, something like
> > create_parallel_vacuum_context() would be clearer.
> >
>
> How about if we name it as parallel_vacuum_init() which will be
> similar InitializeParallelDSM, ExecInitParallelPlan().

parallel_vacuum_init() sounds better.

> Now, I see
> there is some reasoning to keep it in dead_items_alloc as both
> primarily allocate memory for vacuum but maybe we should name the
> function vacuum_space_alloc instead of dead_items_alloc and similarly
> rename dead_items_cleanup to vacuum_space_free. The other idea could
> be to bring begin_parallel_vacuum() back in lazy_scan_heap() but I
> personally prefer the idea to keep it where it is but improve function
> names. Will it be better to do this as a separate patch as 0002
> because this might require some change in the vacuum code path?

Yeah, if we do just renaming functions, I think we can do that in 0001
patch. On the other hand, if we need to change the logic, it's better
to do that in a separate patch.

>
> > >
> > > Minor nit:
> > >
> > > begin_parallel_vacuum()'s comment says:
> > >  * On success (when we can launch one or more workers), will set dead_items and
> > >  * lps in vacrel for caller.
> > >
> > > But it actually doesn't know whether we can start workers. It just checks
> > > max_parallel_maintenance_workers, no?
> >
> > Yes, we cannot know whether we can actually start workers when
> > starting parallel index vacuuming. It returns non-NULL if we request
> > one or more workers.
> >
>
> So can we adjust the comments? I think the part of the sentence "when
> we can launch one or more workers" seems to be the cause of confusion,
> can we remove it?

Yes, we can remove it. Or replace "can launch" with "request".

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Add client connection check during the execution of the query
Next
From: Masahiko Sawada
Date:
Subject: Re: parallel vacuum comments