Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: New IndexAM API controlling index vacuum strategies
Date
Msg-id CAH2-Wzk=4a_BkYyY-b6_A9BpnS3x_YyfTxptfn0pgSL9EzimnA@mail.gmail.com
Whole thread Raw
In response to Re: New IndexAM API controlling index vacuum strategies  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: New IndexAM API controlling index vacuum strategies  (Peter Geoghegan <pg@bowt.ie>)
Re: New IndexAM API controlling index vacuum strategies  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: New IndexAM API controlling index vacuum strategies  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Sun, Jan 17, 2021 at 9:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> After more thought, I think that ambulkdelete needs to be able to
> refer the answer to amvacuumstrategy. That way, the index can skip
> bulk-deletion when lazy vacuum doesn't vacuum heap and it also doesn’t
> want to do that.

Makes sense.

BTW, your patch has bitrot already. Peter E's recent pageinspect
commit happens to conflict with this patch. It might make sense to
produce a new version that just fixes the bitrot, so that other people
don't have to deal with it each time.

> I’ve attached the updated version patch that includes the following changes:

Looks good. I'll give this version a review now. I will do a lot more
soon. I need to come up with a good benchmark for this, that I can
return to again and again during review as needed.

Some feedback on the first patch:

* Just so you know: I agree with you about handling
VACOPT_TERNARY_DISABLED in the index AM's amvacuumstrategy routine. I
think that it's better to do that there, even though this choice may
have some downsides.

* Can you add some "stub" sgml doc changes for this? Doesn't have to
be complete in any way. Just a placeholder for later, that has the
correct general "shape" to orientate the reader of the patch. It can
just be a FIXME comment, plus basic mechanical stuff -- details of the
new amvacuumstrategy_function routine and its signature.

Some feedback on the second patch:

* Why do you move around IndexVacuumStrategy in the second patch?
Looks like a rebasing oversight.

* Actually, do we really need the first and second patches to be
separate patches? I agree that the nbtree patch should be a separate
patch, but dividing the first two sets of changes doesn't seem like it
adds much. Did I miss some something?

* Is the "MAXALIGN(sizeof(ItemIdData)))" change in the definition of
MaxHeapTuplesPerPage appropriate? Here is the relevant section from
the patch:

diff --git a/src/include/access/htup_details.h
b/src/include/access/htup_details.h
index 7c62852e7f..038e7cd580 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -563,17 +563,18 @@ do { \
 /*
  * MaxHeapTuplesPerPage is an upper bound on the number of tuples that can
  * fit on one heap page.  (Note that indexes could have more, because they
- * use a smaller tuple header.)  We arrive at the divisor because each tuple
- * must be maxaligned, and it must have an associated line pointer.
+ * use a smaller tuple header.)  We arrive at the divisor because each line
+ * pointer must be maxaligned.
*** SNIP ***
 #define MaxHeapTuplesPerPage    \
-    ((int) ((BLCKSZ - SizeOfPageHeaderData) / \
-            (MAXALIGN(SizeofHeapTupleHeader) + sizeof(ItemIdData))))
+    ((int) ((BLCKSZ - SizeOfPageHeaderData) / (MAXALIGN(sizeof(ItemIdData)))))

It's true that ItemIdData structs (line pointers) are aligned, but
they're not MAXALIGN()'d. If they were then the on-disk size of line
pointers would generally be 8 bytes, not 4 bytes.

* Maybe it would be better if you just changed the definition such
that "MAXALIGN(SizeofHeapTupleHeader)" became "MAXIMUM_ALIGNOF", with
no other changes? (Some variant of this suggestion might be better,
not sure.)

For some reason that feels a bit safer: we still have an "imaginary
tuple header", but it's just 1 MAXALIGN() quantum now. This is still
much less than the current 3 MAXALIGN() quantums (i.e. what
MaxHeapTuplesPerPage treats as the tuple header size). Do you think
that this alternative approach will be noticeably less effective
within vacuumlazy.c?

Note that you probably understand the issue with MaxHeapTuplesPerPage
for vacuumlazy.c better than I do currently. I'm still trying to
understand your choices, and to understand what is really important
here.

* Maybe add a #define for the value 0.7? (I refer to the value used in
choose_vacuum_strategy() to calculate a "this is the number of LP_DEAD
line pointers that we consider too many" cut off point, which is to be
applied throughout lazy_scan_heap() processing.)

* I notice that your new lazy_vacuum_table_and_indexes() function is
the only place that calls lazy_vacuum_table_and_indexes(). I think
that you should merge them together -- replace the only remaining call
to lazy_vacuum_table_and_indexes() with the body of the function
itself. Having a separate lazy_vacuum_table_and_indexes() function
doesn't seem useful to me -- it doesn't actually hide complexity, and
might even be harder to maintain.

* I suggest thinking about what the last item will mean for the
reporting that currently takes place in
lazy_vacuum_table_and_indexes(), but will now go in an expanded
lazy_vacuum_table_and_indexes() -- how do we count the total number of
index scans now?

I don't actually believe that the logic needs to change, but some kind
of consolidation and streamlining seems like it might be helpful.
Maybe just a comment that says "note that all index scans might just
be no-ops because..." -- stuff like that.

* Any idea about how hard it will be to teach is_wraparound VACUUMs to
skip index cleanup automatically, based on some practical/sensible
criteria?

It would be nice to have a basic PoC for that, even if it remains a
PoC for the foreseeable future (i.e. even if it cannot be committed to
Postgres 14). This feature should definitely be something that your
patch series *enables*. I'd feel good about having covered that
question as part of this basic design work if there was a PoC. That
alone should make it 100% clear that it's easy to do (or no harder
than it should be -- it should ideally be compatible with your basic
design). The exact criteria that we use for deciding whether or not to
skip index cleanup (which probably should not just be "this VACUUM is
is_wraparound, good enough" in the final version) may need to be
debated at length on pgsql-hackers. Even still, it is "just a detail"
in the code. Whereas being *able* to do that with your design (now or
in the future) seems essential now.

> * Store the answers to amvacuumstrategy into either the local memory
> or DSM (in parallel vacuum case) so that ambulkdelete can refer the
> answer to amvacuumstrategy.
> * Fix regression failures.
> * Update the documentation and commments.
>
> Note that 0003 patch is still PoC quality, lacking the btree meta page
> version upgrade.

This patch is not the hard part, of course -- there really isn't that
much needed here compared to vacuumlazy.c. So this patch seems like
the simplest 1 out of the 3 (at least to me).

Some feedback on the third patch:

* The new btm_last_deletion_nblocks metapage field should use P_NONE
(which is 0) to indicate never having been vacuumed -- not
InvalidBlockNumber (which is 0xFFFFFFFF).

This is more idiomatic in nbtree, which is nice, but it has a very
significant practical advantage: it ensures that every heapkeyspace
nbtree index (i.e. those on recent nbtree versions) can be treated as
if it has the new btm_last_deletion_nblocks field all along, even when
it actually built on Postgres 12 or 13. This trick will let you avoid
dealing with the headache of bumping BTREE_VERSION, which is a huge
advantage.

Note that this is the same trick I used to avoid bumping BTREE_VERSION
when the btm_allequalimage field needed to be added (for the nbtree
deduplication feature added to Postgres 13).

* Forgot to do this in the third patch (think I made this same mistake
once myself):

diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 8bb180bbbe..88dfea9af3 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -653,7 +653,7 @@ bt_metap(PG_FUNCTION_ARGS)
     BTMetaPageData *metad;
     TupleDesc   tupleDesc;
     int         j;
-    char       *values[9];
+    char       *values[10];
     Buffer      buffer;
     Page        page;
     HeapTuple   tuple;
@@ -734,6 +734,11 @@ bt_metap(PG_FUNCTION_ARGS)

That's all I have for now...
--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: James Hilliard
Date:
Subject: Re: [PATCH 1/1] Fix detection of pwritev support for OSX.
Next
From: James Hilliard
Date:
Subject: Re: [PATCH 1/1] Fix detection of pwritev support for OSX.