Re: parallel vacuum comments - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: parallel vacuum comments |
Date | |
Msg-id | CAD21AoAR34jZFix3Mi-M4tnXU8k-TcooEoVtvbVD6y3Om+kmSw@mail.gmail.com Whole thread Raw |
In response to | Re: parallel vacuum comments (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Mon, Dec 13, 2021 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Dec 11, 2021 at 8:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Sat, Dec 11, 2021 at 2:32 PM Andres Freund <andres@anarazel.de> wrote: > > > > > > Hi, > > > > > > On 2021-10-30 14:21:01 -0700, Andres Freund wrote: > > > > Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum > > > > related code. And I found a few things that I think could stand improvement: > > > > Thank you for the comments. > > > > > > > > While working on the fix for #17255 (more specifically some cleanup that Peter > > > suggested in the context), I noticed another thing: Initializing parallelism > > > as part of dead_items_alloc() is a bad idea. Even if there are comments noting > > > that oddity. > > > > > > I don't really see why we should do it this way? There's no "no-parallelism" > > > path in begin_parallel_vacuum() besides compute_parallel_vacuum_workers(). So > > > it's not like we might just discover the inability to do parallelism during > > > parallel initialization? > > > > Right. Also, in parallel vacuum case, it allocates the space not only > > for dead items but also other data required to do parallelism like > > shared bulkdeletion results etc. Originally, in PG13, > > begin_parallel_vacuum() was called by lazy_scan_heap() but in PG14 it > > became part of dead_items_alloc() (see b4af70cb2). I agree to change > > this part so that lazy_scan_heap() calls begin_parallel_vacuum() > > (whatever we rename it). I'll incorporate this change in the > > refactoring patch barring any objections. > > > > > > > > It's also not particularly helpful to have a begin_parallel_vacuum() that > > > might not actually begin a parallel vacuum... > > > > During the development, I found that we have some begin_* functions > > that don't start the actual parallel job but prepare state data for > > starting parallel job and referred to _bt_begin_parallel() so I named > > begin_parallel_vacuum(). But I admit that considering what the > > function actually does, something like > > create_parallel_vacuum_context() would be clearer. > > > > How about if we name it as parallel_vacuum_init() which will be > similar InitializeParallelDSM, ExecInitParallelPlan(). parallel_vacuum_init() sounds better. > Now, I see > there is some reasoning to keep it in dead_items_alloc as both > primarily allocate memory for vacuum but maybe we should name the > function vacuum_space_alloc instead of dead_items_alloc and similarly > rename dead_items_cleanup to vacuum_space_free. The other idea could > be to bring begin_parallel_vacuum() back in lazy_scan_heap() but I > personally prefer the idea to keep it where it is but improve function > names. Will it be better to do this as a separate patch as 0002 > because this might require some change in the vacuum code path? Yeah, if we do just renaming functions, I think we can do that in 0001 patch. On the other hand, if we need to change the logic, it's better to do that in a separate patch. > > > > > > > Minor nit: > > > > > > begin_parallel_vacuum()'s comment says: > > > * On success (when we can launch one or more workers), will set dead_items and > > > * lps in vacrel for caller. > > > > > > But it actually doesn't know whether we can start workers. It just checks > > > max_parallel_maintenance_workers, no? > > > > Yes, we cannot know whether we can actually start workers when > > starting parallel index vacuuming. It returns non-NULL if we request > > one or more workers. > > > > So can we adjust the comments? I think the part of the sentence "when > we can launch one or more workers" seems to be the cause of confusion, > can we remove it? Yes, we can remove it. Or replace "can launch" with "request". Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: