Re: Parallel heap vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Parallel heap vacuum
Date
Msg-id CAD21AoBoMmgTEcg=ttJ56O7tj8Pt6ZXaQDpvah1tqmni7ZJ=vg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel heap vacuum  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Parallel heap vacuum
List pgsql-hackers
On Mon, Jul 21, 2025 at 7:28 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Fri, Jul 18, 2025 at 10:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Jul 17, 2025 at 1:39 PM Melanie Plageman
> > <melanieplageman@gmail.com> wrote:
> >
> > > I think there is a fundamental tension here related to whether or not
> > > vacuumparallel.c should be table-AM agnostic. All of its code is
> > > invoked below heap_vacuum_rel(). I would argue that vacuumparallel.c
> > > is index AM-agnostic but heap-specific.
> > >
> > > I think this is what makes your table AM callbacks feel odd. Other
> > > table AM callbacks are invoked from general code -- for example from
> > > executor code that is agnostic to the underlying table AMs. But, all
> > > of the functions you added and the associated callbacks are invoked
> > > below heap_vacuum_rel() (as is all of vacuumparallel.c).
> >
> > What does it exactly mean to make vacuumparallel.c index AM-agnostic
> > but heap-specific? I imagine we change the vacuumparallel.c so that it
> > calls heap's functions for parallel vacuuming such as initializing DSM
> > and parallel vacuum worker's entry function etc. But I'm not sure how
> > it works with non-heap table AMs that uses vacuumparallel.c.
>
> I don't understand what you mean by non-heap table AMs using
> vacuumparallel.c. Do you mean that they are using it like a library? I
> don't understand why the existing parallel index vacuumming code added
> table AM callbacks either. They are only invoked from below
> heap_vacuum_rel(). Unless I am missing something? From looking at
> tableam.h, the parallel vacuum callbacks seem to be the only table AM
> callbacks that are not invoked from table AM agnostic code. What was
> the point of adding them?

Given your suggestion that we should make vacuumparallel.c index
AM-agnostic but heap-specific, I imagined that parallel_vacuum_init()
calls a function defined in vacuumlazy.c that initializes the DSM
space for parallel heap vacuum. However, since parallel_vacuum_init()
is exposed, IIUC every table AM can use it. Therefore, for example a
non-heap table AM calls parallel_vacuum_init() in its own
relation_vacuum callback in order to vacuum its indexes in parallel,
it means eventually to call heap's DSM-initialization function, which
won't work.

> I can see how other table AMs implementing vacuum may want to use some
> of the vacuumparallel.c functions exposed in the vacuum.h header file
> to save time and code for the index vacuuming portion of vacuuming.
> But why have table AM callbacks that aren't invoked unless you are
> using heap?
>
> More generally, vacuuming is a very table AM-specific operation. You
> often wouldn't need any of the same data structures. Think about an
> undo-based table AM -- you don't need dead items or anything like
> that. Or the phase ordering -- other table AMs may not need to vacuum
> the table then the indexes then the table. Something like a scan, it
> just has to yield tuples to the executor. But vacuum seems
> inextricably linked with the on-disk layout. Most of the vacuum.c code
> is just concerned with parsing options, checking permissions,
> determining precedence with GUCs/table options, and doing catalog
> operations.

I guess it ultimately depends on how other table AMs define what to do
in the VACUUM command (or autovacuum). For instance, the columnar
table AM simply updates the relation statistics during the vacuum
command[1]. Since it can efficiently collect statistics such as the
number of live tuples, it completes this process quickly. If an AM
(not limited to the columnar table AM) needs to gather more
comprehensive statistics that might require scanning the entire table,
the feature introduced by this patch -- multiple workers scanning the
table while collecting data -- could prove beneficial and it might
want to use it in relation_vacuum callback. The patch introduces the
parallel_vacuum_collect_dead_items callback, but it's not restricted
to collecting only dead tuples. It can be used for gathering various
types of data.

Regarding phase ordering, both the existing vacuumparallel.c features
and the new parallel table vacuuming feature are flexible in their
execution sequence. If a table AM needs to invoke parallel index
vacuuming before parallel table vacuuming, it can call the
corresponding functions in that order.

>
> > One possible change would be to have lazyvacuum.c pass the set of
> > callbacks to parallel_vacuum_init() instead of having them as table AM
> > callbacks. This removes the weirdness of the associated table AM
> > callbacks being invoked below heap_vacuum_rel() but it doesn't address
> > your point "vacuumparallel.c is index AM-agnostic but heap-specific".
>
> I do agree you are in a tough spot if we keep the index parallel
> vacuuming callbacks in tableam.h. It is weird to call a heap-specific
> function from within a table AM callback. I guess my first question is
> which table AMs are using the existing callbacks and how are they
> doing that since I don't see places where they are called from non
> heap code. If they use the vacuum.h exposed vacuumparallel.c functions
> as a library, they could do that without the callbacks in tableam.h.
> But I think I must just be missing something.

Citus columnar table and orioledb are open-source table AM
implementations I know, and it seems that orioledb's vacuum codes are
somewhat similar to heap's one[2] (I don't know the details of
orioledb though); it initializes parallel vacuum by calling
vacuum_parallel_init() in its relation_vacuum callback,
orioledb_vacuum_rel().

The reason why I added some callbacks as table AM callbacks in the
patch is that I could not find other better places. Currently,
vacuumparallel.c handles several critical operations for parallel
vacuuming: allocating and initializing DSM space for parallel index
vacuuming, initializing and launching parallel workers, and managing
various auxiliary tasks such as configuring vacuum delays and setting
maintenance_work_mem for workers. Given these existing
functionalities, I aimed to implement the parallel heap vacuum worker
launch through the same vacuumparallel.c codebase, maintaining a
consistent interface. To achieve this integration, vacuumparallel.c
requires access to heap-specific functions, and defining them as table
AM callbacks emerged as the cleanest solution. My personal vision is
to maintain vacuumparallel.c's neutrality regarding both index AM and
table AM implementations, while providing a unified interface for
table AMs that wish to leverage parallel operations-- whether for
table vacuuming, index vacuuming, or both.

>
> > > And even for cases where the information has to be passed from the
> > > leader to the worker, there is no reason you can't have the same
> > > struct but in shared memory for the parallel case and local memory for
> > > the serial case. For example with the struct members "aggressive",
> > > "skipwithvm", and "cutoffs" in LVRelState and ParallelLVShared. Or
> > > ParallelLVScanDesc->nblocks and LVRelState->rel_pages.
> > >
> > > Ideally the entire part of the LVRelState that is an unchanging
> > > input/reference data is in a struct which is in local memory for the
> > > serial and local parallel case and a single read-only location in the
> > > shared parallel case. Or, for the shared case, if there is a reason
> > > not to read from the shared memory, we copy them over to a local
> > > instance of the same struct. Maybe it is called HeapVacInput or
> > > similar.
> >
> > ParallelLVShared is created to pass information to parallel vacuum
> > workers while keeping LVRelStates able to work locally. Suppose that
> > we create HeapVacInput including "aggressive", "cutoff", "skipwithvm",
> > and "rel_pages", LVRelState would have to have a pointer to a
> > HeapVacInput instance that is either on local memory or shared memory.
> > Since we also need to pass other information such as
> > initial_chunk_size and eager scanning related information to the
> > parallel vacuum worker, we would have to create something like
> > ParallelLVShared as the patch does. As a result, we would have two
> > structs that need to be shared on the shared buffer. Is that kind of
> > what you meant? or did you mean that we include parallel vacuum
> > related fields too to HeapVacInput struct?
>
> Yes, I think generally this is what I am saying.
>
> Honestly, I think the two biggest problems right now are that 1) the
> existing LVRelState mixes too many different types of data (e.g.
> output stats and input data) and that 2) the structs your patch
> introduces don't have names making it clear enough what they are.
>
> What is ParallelLVScanDesc? We don't have the convention of a scan
> descriptor for vacuum, so is the point here that it is only used for a
> single "scan" of the heap table? That sounds like all of phase I to
> me.

Yes, ParallelLVScanDesc is added to control the working state during
do_parallel_lazy_scan_heap(), a parallel variant of phase I.

> Or why is the static input data structure just called
> ParallelLVShared? Nothing about that indicates read-only or static.
>
> When I was imagining how to make it more clear, I was thinking
> something like what you are saying above. There are two types of input
> read-only data passed from the leader to the workers. One is
> parallel-only and one is needed in both the parallel and serial cases.
> I was suggesting having the common input read-only fields in the same
> type of struct. That would mean having two types of input data
> structures in the parallel-case -- one parallel only and one common.
> It's possible that that is more confusing.
>
> I think it also depends on how the workers end up accessing the data.
> For input read-only data you can either have workers access a single
> copy in shared memory (and have a pointer that points to shared memory
> in the parallel case and local memory in the serial case) or you can
> copy the data over to local memory so that the parallel and serial
> cases use it in the same data structure. You are doing mostly the
> latter but the setup is spread around enough that it isn't immediately
> clear that is what is happening.
>
> One thing that might be worth trying is moving the setup steps in
> heap_vacuum_rel() into some kind of helper function that can be called
> both in heap_vacuum_rel() by the leader and also in the worker setup.
> Then it might make it clear what only the leader does (e.g. call
> vacuum_get_cutoffs()) and what everyone does. I'm just brainstorming
> here, though.
>
> I think if you just make a new version of the patch with the eager
> scanning separated out and the names of the structs clarified, I could
> try to be more specific. I found the spread of members across the
> different structs hard to parse, but I'm not 100% sure yet what would
> make it easier.

Okay, I'll try to clarify the names and the above idea in the next
version patch.

>
> > I wanted to keep ParallelLVShared() mostly read-only and aimed to
> > share the information from the leader to the workers, whereas
> > ParallelLVScanDesc() is a pure scan state maintained by each worker
> > (and the leader).
>
> I'm not sure what scan state means here. nallocated seems to be a
> coordination member. But nblocks seems to be used in places that you
> have access to the LVRelState, which has rel_pages, so why wouldn't
> you just use that?

Right, we can use rel_pages instead.

>
> > chunk_size is used when allocating the new chunk (see
> > parallel_lazy_scan_get_nextpage()). It's initialized by
> > vacrel->plvstate->shared->initial_chunk_size since the first chunk
> > size could vary depending on eager scanning state, and is updated to
> > the fixed size PARALLEL_LV_CHUNK_SIZE from the next chunk.
>
> I see that ParallelLVScanWorkerData->chunk_size is used in that way,
> but I don't see where ParallelLVScanDesc->chunk_size is used.

It seems that ParallelLVScanDesc->chunk_size is not used at all, so
I'll remove it.

> Also,
> why does ParallelLVScanWorkerData->chunk_size need to be in shared
> memory? Isn't the worker just using it locally?

I put it in shared memory for the case where the
ParallelLVScanWorkerData is re-used by another worker after resuming
phase I. But the worker can simply use PARALLEL_LV_CHUNK_SIZE in that
case as the first chunk should have already been allocated. I'll
remove it.

Regards,

[1] https://github.com/citusdata/citus/blob/main/src/backend/columnar/columnar_tableam.c#L1058
[2] https://github.com/orioledb/orioledb/blob/main/src/tableam/vacuum.c#L353



--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Shlok Kyal
Date:
Subject: Re: Skipping schema changes in publication
Next
From: Vik Fearing
Date:
Subject: Re: Proposal: QUALIFY clause