Re: parallel vacuum comments - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: parallel vacuum comments |
Date | |
Msg-id | CAA4eK1LKNtohNxbsjyU0YYCfXDJ0UzyJ-yXds2QURXuZ=eBzcg@mail.gmail.com Whole thread Raw |
In response to | Re: parallel vacuum comments (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: parallel vacuum comments
|
List | pgsql-hackers |
On Sat, Dec 11, 2021 at 8:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sat, Dec 11, 2021 at 2:32 PM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2021-10-30 14:21:01 -0700, Andres Freund wrote: > > > Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum > > > related code. And I found a few things that I think could stand improvement: > > Thank you for the comments. > > > > > While working on the fix for #17255 (more specifically some cleanup that Peter > > suggested in the context), I noticed another thing: Initializing parallelism > > as part of dead_items_alloc() is a bad idea. Even if there are comments noting > > that oddity. > > > > I don't really see why we should do it this way? There's no "no-parallelism" > > path in begin_parallel_vacuum() besides compute_parallel_vacuum_workers(). So > > it's not like we might just discover the inability to do parallelism during > > parallel initialization? > > Right. Also, in parallel vacuum case, it allocates the space not only > for dead items but also other data required to do parallelism like > shared bulkdeletion results etc. Originally, in PG13, > begin_parallel_vacuum() was called by lazy_scan_heap() but in PG14 it > became part of dead_items_alloc() (see b4af70cb2). I agree to change > this part so that lazy_scan_heap() calls begin_parallel_vacuum() > (whatever we rename it). I'll incorporate this change in the > refactoring patch barring any objections. > > > > > It's also not particularly helpful to have a begin_parallel_vacuum() that > > might not actually begin a parallel vacuum... > > During the development, I found that we have some begin_* functions > that don't start the actual parallel job but prepare state data for > starting parallel job and referred to _bt_begin_parallel() so I named > begin_parallel_vacuum(). But I admit that considering what the > function actually does, something like > create_parallel_vacuum_context() would be clearer. > How about if we name it as parallel_vacuum_init() which will be similar InitializeParallelDSM, ExecInitParallelPlan(). Now, I see there is some reasoning to keep it in dead_items_alloc as both primarily allocate memory for vacuum but maybe we should name the function vacuum_space_alloc instead of dead_items_alloc and similarly rename dead_items_cleanup to vacuum_space_free. The other idea could be to bring begin_parallel_vacuum() back in lazy_scan_heap() but I personally prefer the idea to keep it where it is but improve function names. Will it be better to do this as a separate patch as 0002 because this might require some change in the vacuum code path? > > > > Minor nit: > > > > begin_parallel_vacuum()'s comment says: > > * On success (when we can launch one or more workers), will set dead_items and > > * lps in vacrel for caller. > > > > But it actually doesn't know whether we can start workers. It just checks > > max_parallel_maintenance_workers, no? > > Yes, we cannot know whether we can actually start workers when > starting parallel index vacuuming. It returns non-NULL if we request > one or more workers. > So can we adjust the comments? I think the part of the sentence "when we can launch one or more workers" seems to be the cause of confusion, can we remove it? -- With Regards, Amit Kapila.
pgsql-hackers by date: