Re: parallel vacuum comments - Mailing list pgsql-hackers

From Andres Freund
Subject Re: parallel vacuum comments
Date
Msg-id 20211104190003.l5547auul46k7d5v@alap3.anarazel.de
Whole thread Raw
In response to Re: parallel vacuum comments  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: parallel vacuum comments
List pgsql-hackers
Hi,

On 2021-11-01 10:44:34 +0900, Masahiko Sawada wrote:
> On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
> >   But even though we have this space optimized bitmap thing, we actually need
> >   more memory allocated for each index, making this whole thing pointless.
> 
> Right. But is better to change to use booleans?

It seems very clearly better to me. We shouldn't use complicated stuff like

#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
#define GetSharedIndStats(s) \
    ((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
#define IndStatsIsNull(s, i) \
    (!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))

when there's reason / benefit.


> > - Imo it's pretty confusing to have functions like
> >   lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
> >   vacuum or index cleanup with parallel workers.", based on
> >   lps->lvshared->for_cleanup.
> 
> Okay. We need to set lps->lvshared->for_cleanup to tell worker do
> either index vacuum or index cleanup. So it might be better to pass
> for_cleanup flag down to the functions in addition to setting
> lps->lvshared->for_cleanup.

Yup.


> > - I don't like some of the new names introduced in 14. E.g.
> >   "do_parallel_processing" is way too generic.
> 
> I listed the function names that probably needs to be renamed from
> that perspecti:
> 
> * do_parallel_processing
> * do_serial_processing_for_unsafe_indexes
> * parallel_process_one_index
> 
> Is there any other function that should be renamed?

parallel_processing_is_safe().

I don't like that there's three different naming patterns for parallel
things. There's do_parallel_*, there's parallel_, and there's
(begin|end|compute)_parallel_*.


> > - On a higher level, a lot of this actually doesn't seem to belong into
> >   vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
> >   code is heap specific.  And vacuumlazy.c is large enough without the parallel
> >   code.
> 
> I don't come with an idea to make them more generic. Could you
> elaborate on that?

To me the the job that the parallel vacuum stuff does isn't really specific to
heap. Any table AM supporting indexes is going to need to do something pretty
much like it (it's calling indexam stuff). Most of the stuff in vacuumlazy.c
is very heap specific - you're not going to be able to reuse lazy_scan_heap()
or such. Before the parallel vacuum stuff, the index specific code in
vacuumlazy.c was fairly limited - but now it's a nontrivial amount of code.

Based on a quick look
  parallel_vacuum_main(), parallel_processing_is_safe(),
  parallel_stats_for_idx(), end_parallel_vacuum(), begin_parallel_vacuum(),
  compute_parallel_vacuum_workers(), parallel_process_one_index(),
  do_serial_processing_for_unsafe_indexes(), do_parallel_processing(),
  do_parallel_vacuum_or_cleanup(), do_parallel_lazy_cleanup_all_indexes(),
  do_parallel_lazy_vacuum_all_indexes(),

don't really belong in vacuumlazy.c. but should be in vacuum.c or a new
file. Of course that requires a bit of work, because of the heavy reliance on
LVRelState, but afaict there's not really an intrinsic need to use that.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Time to drop plpython2?
Next
From: Peter Geoghegan
Date:
Subject: Re: parallel vacuum comments