Re: parallel vacuum comments - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: parallel vacuum comments
Date
Msg-id CAA4eK1LKNtohNxbsjyU0YYCfXDJ0UzyJ-yXds2QURXuZ=eBzcg@mail.gmail.com
Whole thread Raw
In response to Re: parallel vacuum comments  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: parallel vacuum comments
List pgsql-hackers
On Sat, Dec 11, 2021 at 8:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Dec 11, 2021 at 2:32 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2021-10-30 14:21:01 -0700, Andres Freund wrote:
> > > Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
> > > related code. And I found a few things that I think could stand improvement:
>
> Thank you for the comments.
>
> >
> > While working on the fix for #17255 (more specifically some cleanup that Peter
> > suggested in the context), I noticed another thing: Initializing parallelism
> > as part of dead_items_alloc() is a bad idea. Even if there are comments noting
> > that oddity.
> >
> > I don't really see why we should do it this way? There's no "no-parallelism"
> > path in begin_parallel_vacuum() besides compute_parallel_vacuum_workers(). So
> > it's not like we might just discover the inability to do parallelism during
> > parallel initialization?
>
> Right. Also, in parallel vacuum case, it allocates the space not only
> for dead items but also other data required to do parallelism like
> shared bulkdeletion results etc. Originally, in PG13,
> begin_parallel_vacuum() was called by lazy_scan_heap() but in PG14 it
> became part of dead_items_alloc() (see b4af70cb2). I agree to change
> this part so that lazy_scan_heap() calls begin_parallel_vacuum()
> (whatever we rename it). I'll incorporate this change in the
> refactoring patch barring any objections.
>
> >
> > It's also not particularly helpful to have a begin_parallel_vacuum() that
> > might not actually begin a parallel vacuum...
>
> During the development, I found that we have some begin_* functions
> that don't start the actual parallel job but prepare state data for
> starting parallel job and referred to  _bt_begin_parallel() so I named
> begin_parallel_vacuum(). But I admit that considering what the
> function actually does, something like
> create_parallel_vacuum_context() would be clearer.
>

How about if we name it as parallel_vacuum_init() which will be
similar InitializeParallelDSM, ExecInitParallelPlan(). Now, I see
there is some reasoning to keep it in dead_items_alloc as both
primarily allocate memory for vacuum but maybe we should name the
function vacuum_space_alloc instead of dead_items_alloc and similarly
rename dead_items_cleanup to vacuum_space_free. The other idea could
be to bring begin_parallel_vacuum() back in lazy_scan_heap() but I
personally prefer the idea to keep it where it is but improve function
names. Will it be better to do this as a separate patch as 0002
because this might require some change in the vacuum code path?

> >
> > Minor nit:
> >
> > begin_parallel_vacuum()'s comment says:
> >  * On success (when we can launch one or more workers), will set dead_items and
> >  * lps in vacrel for caller.
> >
> > But it actually doesn't know whether we can start workers. It just checks
> > max_parallel_maintenance_workers, no?
>
> Yes, we cannot know whether we can actually start workers when
> starting parallel index vacuuming. It returns non-NULL if we request
> one or more workers.
>

So can we adjust the comments? I think the part of the sentence "when
we can launch one or more workers" seems to be the cause of confusion,
can we remove it?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Ashutosh Sharma
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints