Re: Summarizing indexes allowing single-phase VACUUM? - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Summarizing indexes allowing single-phase VACUUM?
Date
Msg-id CAD21AoBta=4GHkc5a3QNyHUWU-47SvC6SpV7c3u0Dxaajo6rHg@mail.gmail.com
Whole thread Raw
In response to Summarizing indexes allowing single-phase VACUUM?  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
On Wed, Apr 9, 2025 at 2:47 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> Hi,
>
> With PG16, we got Index AM -level indications for summarizing indexes,
> improving HOT applicability.
>
> Given that these indexes explicitly never store TIDs and thus don't
> really need the cleanup part of vacuum's index cleanup, we could well
> decide to not count those indexes to the indexes we need to clean up
> in the index scan phase of vacuum, thus allowing tables with only
> summarizing indexes to use the single-scan option of vacuum.
>
> As a summarizing index may still want to do some housekeeping during
> vacuum, it'd still need to get its cleanup functions called, but it'd
> at least save the io and WAL of the cleanup scan.
>
> Any comments/suggestions? POC patch attached.

I like this idea.

Here are some comments on the PoC patch:

-       if (vacrel->nindexes == 0
+       if (vacrel->indallsummarizing

I'm not a fan of this change as it isn't obvious from the name
indallsummarizing that the table doesn't have any indexes. I think we
can keep nindexes==0 or rename indallsummarizing to something like
use_onepass_strategy if we want to use merge them.

---
@@ -2536,8 +2544,12 @@ lazy_vacuum(LVRelState *vacrel)
        /*
         * We successfully completed a round of index vacuuming.  Do related
         * heap vacuuming now.
+        *
+        * If all valid indexes are summarizing, then the TIDs have already
+        * been reclaimed, requiring us to skip that last phase.
         */
-       lazy_vacuum_heap_rel(vacrel);
+       if (!vacrel->indallsummarizing)
+           lazy_vacuum_heap_rel(vacrel);

IIUC if indallsummarizing is true we should never reach lazy_vacuum().

---
@@ -241,8 +242,9 @@ static void parallel_vacuum_error_callback(void *arg);
  */
 ParallelVacuumState *
 parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
-                    int nrequested_workers, int vac_work_mem,
-                    int elevel, BufferAccessStrategy bstrategy)
+                    bool indallsummarizing, int nrequested_workers,
+                    int vac_work_mem, int elevel,
+                    BufferAccessStrategy bstrategy)
 {
    ParallelVacuumState *pvs;
    ParallelContext *pcxt;
@@ -282,6 +284,7 @@ parallel_vacuum_init(Relation rel, Relation
*indrels, int nindexes,
    pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
    pvs->indrels = indrels;
    pvs->nindexes = nindexes;
+   pvs->indallsummarizing = nindexes;
    pvs->will_parallel_vacuum = will_parallel_vacuum;
    pvs->bstrategy = bstrategy;
    pvs->heaprel = rel;

We should set pvs->indallsummarizing to indallsummarizing, not
nindexes, but ISTM we don't use pvs->indallsummazing anywhere in
vacuumparallel.c. Do we need to pass indallsummarizing to
parallel_vacuum_init() in the first place?

Regards,


--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: AIO v2.5
Next
From: Tom Lane
Date:
Subject: Re: gcc 15.1 warnings - jsonb_util.c