Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: New IndexAM API controlling index vacuum strategies
Date
Msg-id CAD21AoAkBG3vJ2-Mqe-feq4CkHyg5wTGFjpAs4yMPq9HxuAmAA@mail.gmail.com
Whole thread Raw
In response to Re: New IndexAM API controlling index vacuum strategies  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: New IndexAM API controlling index vacuum strategies
List pgsql-hackers
On Sun, Apr 4, 2021 at 11:00 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Apr 1, 2021 at 8:25 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > I've also found a way to further simplify the table-without-indexes
> > case: make it behave like a regular two-pass/has-indexes VACUUM with
> > regard to visibility map stuff when the page doesn't need a call to
> > lazy_vacuum_heap() (because there are no LP_DEAD items to set
> > LP_UNUSED on the page following pruning). But when it does call
> > lazy_vacuum_heap(), the call takes care of everything for
> > lazy_scan_heap(), which just continues to the next page due to
> > considering prunestate to have been "invalidated" by the call to
> > lazy_vacuum_heap(). So there is absolutely minimal special case code
> > for the table-without-indexes case now.
>
> Attached is v10, which simplifies the one-pass/table-without-indexes
> VACUUM as described.
>

Thank you for updating the patch.

>
> * I now include a modified version of Matthias van de Meent's line
> pointer truncation patch [1].
>
> Matthias' patch seems very much in scope here. The broader patch
> series establishes the principle that we can leave LP_DEAD line
> pointers in an unreclaimed state indefinitely, without consequence
> (beyond the obvious).  We had better avoid line pointer bloat that
> cannot be reversed when VACUUM does eventually get around to doing a
> second pass over the heap. This is another case where it seems prudent
> to keep the costs understandable/linear -- page-level line pointer
> bloat seems like a cost that increases in a non-linear fashion, which
> undermines the whole idea of modelling when it's okay to skip
> index/heap vacuuming. (Also, line pointer bloat sucks.)
>
> Line pointer truncation doesn't happen during pruning, as it did in
> Matthias' original patch. In this revised version, line pointer
> truncation occurs during the second phase of VACUUM. There are several
> reasons to prefer this approach. It seems both safer and more useful
> that way (compared to the approach of doing line pointer truncation
> during pruning). It also makes intuitive sense to do it this way, at
> least to me -- the second pass over the heap is supposed to be for
> "freeing" LP_DEAD line pointers.

+1

0002, 0003, and 0004 patches look good to me. 0001 and 0005 also look
good to me but I have some trivial review comments on them.

0001 patch:

                /*
-                * Now that stats[idx] points to the DSM segment, we
don't need the
-                * locally allocated results.
+                * Now that top-level indstats[idx] points to the DSM
segment, we
+                * don't need the locally allocated results.
                 */
-               pfree(*stats);
-               *stats = bulkdelete_res;
+               pfree(istat);
+               istat = bulkdelete_res;

Did you try the change around parallel_process_one_index() that I
suggested in the previous reply[1]? If we don't change the logic, we
need to update the above comment. Previously, we update stats[idx] in
vacuum_one_index() (renamed to parallel_process_one_index()) but with
your patch, where we update it is its caller.

---
+lazy_vacuum_all_indexes(LVRelState *vacrel)
 {
-       Assert(!IsParallelWorker());
-       Assert(nindexes > 0);
+       Assert(vacrel->nindexes > 0);
+       Assert(TransactionIdIsNormal(vacrel->relfrozenxid));
+       Assert(MultiXactIdIsValid(vacrel->relminmxid));

and

-       Assert(!IsParallelWorker());
-       Assert(nindexes > 0);
+       Assert(vacrel->nindexes > 0);

We removed two Assert(!IsParallelWorker()) at two places. It seems to
me that those assertions are still valid. Do we really need to remove
them?

---
0004 patch:

src/backend/access/heap/heapam.c:638: trailing whitespace.
+           /*

I found a whitespace issue.

---
0005 patch:

+ * Caller is expected to call here before and after vacuuming each index in
+ * the case of two-pass VACUUM, or every BYPASS_EMERGENCY_MIN_PAGES blocks in
+ * the case of no-indexes/one-pass VACUUM.

I think it should be "every VACUUM_FSM_EVERY_PAGES blocks" instead of
"every BYPASS_EMERGENCY_MIN_PAGES blocks".

---
+/*
+ * Threshold that controls whether we bypass index vacuuming and heap
+ * vacuuming.  When we're under the threshold they're deemed unnecessary.
+ * BYPASS_THRESHOLD_PAGES is applied as a multiplier on the table's rel_pages
+ * for those pages known to contain one or more LP_DEAD items.
+ */
+#define BYPASS_THRESHOLD_PAGES 0.02    /* i.e. 2% of rel_pages */
+
+#define BYPASS_EMERGENCY_MIN_PAGES \
+   ((BlockNumber) (((uint64) 4 * 1024 * 1024 * 1024) / BLCKSZ))
+

I think we need a description for BYPASS_EMERGENCY_MIN_PAGES.

---
        for (int idx = 0; idx < vacrel->nindexes; idx++)
        {
            Relation    indrel = vacrel->indrels[idx];
            IndexBulkDeleteResult *istat = vacrel->indstats[idx];

            vacrel->indstats[idx] =
                lazy_vacuum_one_index(indrel, istat, vacrel->old_live_tuples,
                                      vacrel);
+
+           if (should_speedup_failsafe(vacrel))
+           {
+               /* Wraparound emergency -- end current index scan */
+               allindexes = false;
+               break;
+           }

allindexes can be false even if we process all indexes, which is fine
with me because setting allindexes = false disables the subsequent
heap vacuuming. I think it's appropriate behavior in emergency cases.
In that sense, can we do should_speedup_failsafe() check also after
parallel index vacuuming? And we can also check it at the beginning of
lazy vacuum.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoDOWo4H6vmtLZoJ2SznMp_zOej2Kww%2BJBkVRPXs%2Bj48uw%40mail.gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Next
From: Masahiko Sawada
Date:
Subject: Re: Flaky vacuum truncate test in reloptions.sql