Re: parallel vacuum comments - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: parallel vacuum comments
Date
Msg-id CAD21AoATpmazwdYge7rN_GGM_mk3v_u0T=-DdQ-kAp1eJkjf3A@mail.gmail.com
Whole thread Raw
In response to Re: parallel vacuum comments  (Andres Freund <andres@anarazel.de>)
Responses Re: parallel vacuum comments  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Sat, Dec 11, 2021 at 2:32 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-10-30 14:21:01 -0700, Andres Freund wrote:
> > Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
> > related code. And I found a few things that I think could stand improvement:

Thank you for the comments.

>
> While working on the fix for #17255 (more specifically some cleanup that Peter
> suggested in the context), I noticed another thing: Initializing parallelism
> as part of dead_items_alloc() is a bad idea. Even if there are comments noting
> that oddity.
>
> I don't really see why we should do it this way? There's no "no-parallelism"
> path in begin_parallel_vacuum() besides compute_parallel_vacuum_workers(). So
> it's not like we might just discover the inability to do parallelism during
> parallel initialization?

Right. Also, in parallel vacuum case, it allocates the space not only
for dead items but also other data required to do parallelism like
shared bulkdeletion results etc. Originally, in PG13,
begin_parallel_vacuum() was called by lazy_scan_heap() but in PG14 it
became part of dead_items_alloc() (see b4af70cb2). I agree to change
this part so that lazy_scan_heap() calls begin_parallel_vacuum()
(whatever we rename it). I'll incorporate this change in the
refactoring patch barring any objections.

>
> It's also not particularly helpful to have a begin_parallel_vacuum() that
> might not actually begin a parallel vacuum...

During the development, I found that we have some begin_* functions
that don't start the actual parallel job but prepare state data for
starting parallel job and referred to  _bt_begin_parallel() so I named
begin_parallel_vacuum(). But I admit that considering what the
function actually does, something like
create_parallel_vacuum_context() would be clearer.

>
> Minor nit:
>
> begin_parallel_vacuum()'s comment says:
>  * On success (when we can launch one or more workers), will set dead_items and
>  * lps in vacrel for caller.
>
> But it actually doesn't know whether we can start workers. It just checks
> max_parallel_maintenance_workers, no?

Yes, we cannot know whether we can actually start workers when
starting parallel index vacuuming. It returns non-NULL if we request
one or more workers.

Regards

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: isolationtester: add session name to application name
Next
From: Andrew Dunstan
Date:
Subject: Re: pg_dump versus ancient server versions