Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: New IndexAM API controlling index vacuum strategies
Date
Msg-id CAD21AoBPw7FpMxNe++FwrT-oykTn6omGqTnamtQX8oWGoQvj=A@mail.gmail.com
Whole thread Raw
In response to Re: New IndexAM API controlling index vacuum strategies  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Mon, Jan 25, 2021 at 5:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Jan 21, 2021 at 11:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jan 20, 2021 at 7:58 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > >
> > > On Sun, Jan 17, 2021 at 9:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > After more thought, I think that ambulkdelete needs to be able to
> > > > refer the answer to amvacuumstrategy. That way, the index can skip
> > > > bulk-deletion when lazy vacuum doesn't vacuum heap and it also doesn’t
> > > > want to do that.
> > >
> > > Makes sense.
> > >
> > > BTW, your patch has bitrot already. Peter E's recent pageinspect
> > > commit happens to conflict with this patch. It might make sense to
> > > produce a new version that just fixes the bitrot, so that other people
> > > don't have to deal with it each time.
> > >
> > > > I’ve attached the updated version patch that includes the following changes:
> > >
> > > Looks good. I'll give this version a review now. I will do a lot more
> > > soon. I need to come up with a good benchmark for this, that I can
> > > return to again and again during review as needed.
> >
> > Thank you for reviewing the patches.
> >
> > >
> > > Some feedback on the first patch:
> > >
> > > * Just so you know: I agree with you about handling
> > > VACOPT_TERNARY_DISABLED in the index AM's amvacuumstrategy routine. I
> > > think that it's better to do that there, even though this choice may
> > > have some downsides.
> > >
> > > * Can you add some "stub" sgml doc changes for this? Doesn't have to
> > > be complete in any way. Just a placeholder for later, that has the
> > > correct general "shape" to orientate the reader of the patch. It can
> > > just be a FIXME comment, plus basic mechanical stuff -- details of the
> > > new amvacuumstrategy_function routine and its signature.
> > >
> >
> > 0002 patch had the doc update (I mistakenly included it to 0002
> > patch). Is that update what you meant?
> >
> > > Some feedback on the second patch:
> > >
> > > * Why do you move around IndexVacuumStrategy in the second patch?
> > > Looks like a rebasing oversight.
> >
> > Check.
> >
> > >
> > > * Actually, do we really need the first and second patches to be
> > > separate patches? I agree that the nbtree patch should be a separate
> > > patch, but dividing the first two sets of changes doesn't seem like it
> > > adds much. Did I miss some something?
> >
> > I separated the patches since I used to have separate patches when
> > adding other index AM options required by parallel vacuum. But I
> > agreed to merge the first two patches.
> >
> > >
> > > * Is the "MAXALIGN(sizeof(ItemIdData)))" change in the definition of
> > > MaxHeapTuplesPerPage appropriate? Here is the relevant section from
> > > the patch:
> > >
> > > diff --git a/src/include/access/htup_details.h
> > > b/src/include/access/htup_details.h
> > > index 7c62852e7f..038e7cd580 100644
> > > --- a/src/include/access/htup_details.h
> > > +++ b/src/include/access/htup_details.h
> > > @@ -563,17 +563,18 @@ do { \
> > >  /*
> > >   * MaxHeapTuplesPerPage is an upper bound on the number of tuples that can
> > >   * fit on one heap page.  (Note that indexes could have more, because they
> > > - * use a smaller tuple header.)  We arrive at the divisor because each tuple
> > > - * must be maxaligned, and it must have an associated line pointer.
> > > + * use a smaller tuple header.)  We arrive at the divisor because each line
> > > + * pointer must be maxaligned.
> > > *** SNIP ***
> > >  #define MaxHeapTuplesPerPage    \
> > > -    ((int) ((BLCKSZ - SizeOfPageHeaderData) / \
> > > -            (MAXALIGN(SizeofHeapTupleHeader) + sizeof(ItemIdData))))
> > > +    ((int) ((BLCKSZ - SizeOfPageHeaderData) / (MAXALIGN(sizeof(ItemIdData)))))
> > >
> > > It's true that ItemIdData structs (line pointers) are aligned, but
> > > they're not MAXALIGN()'d. If they were then the on-disk size of line
> > > pointers would generally be 8 bytes, not 4 bytes.
> >
> > You're right. Will fix.
> >
> > >
> > > * Maybe it would be better if you just changed the definition such
> > > that "MAXALIGN(SizeofHeapTupleHeader)" became "MAXIMUM_ALIGNOF", with
> > > no other changes? (Some variant of this suggestion might be better,
> > > not sure.)
> > >
> > > For some reason that feels a bit safer: we still have an "imaginary
> > > tuple header", but it's just 1 MAXALIGN() quantum now. This is still
> > > much less than the current 3 MAXALIGN() quantums (i.e. what
> > > MaxHeapTuplesPerPage treats as the tuple header size). Do you think
> > > that this alternative approach will be noticeably less effective
> > > within vacuumlazy.c?
> > >
> > > Note that you probably understand the issue with MaxHeapTuplesPerPage
> > > for vacuumlazy.c better than I do currently. I'm still trying to
> > > understand your choices, and to understand what is really important
> > > here.
> >
> > Yeah, using MAXIMUM_ALIGNOF seems better for safety. I shared my
> > thoughts on the issue with MaxHeapTuplesPerPage yesterday. I think we
> > need to discuss how to deal with that.
> >
> > >
> > > * Maybe add a #define for the value 0.7? (I refer to the value used in
> > > choose_vacuum_strategy() to calculate a "this is the number of LP_DEAD
> > > line pointers that we consider too many" cut off point, which is to be
> > > applied throughout lazy_scan_heap() processing.)
> > >
> >
> > Agreed.
> >
> > > * I notice that your new lazy_vacuum_table_and_indexes() function is
> > > the only place that calls lazy_vacuum_table_and_indexes(). I think
> > > that you should merge them together -- replace the only remaining call
> > > to lazy_vacuum_table_and_indexes() with the body of the function
> > > itself. Having a separate lazy_vacuum_table_and_indexes() function
> > > doesn't seem useful to me -- it doesn't actually hide complexity, and
> > > might even be harder to maintain.
> >
> > lazy_vacuum_table_and_indexes() is called at two places: after
> > maintenance_work_mem run out (around L1097) and the end of
> > lazy_scan_heap() (around L1726). I defined this function to pack the
> > operations such as choosing a strategy, vacuuming indexes and
> > vacuuming heap. Without this function, will we end up writing the same
> > codes twice there?
> >
> > >
> > > * I suggest thinking about what the last item will mean for the
> > > reporting that currently takes place in
> > > lazy_vacuum_table_and_indexes(), but will now go in an expanded
> > > lazy_vacuum_table_and_indexes() -- how do we count the total number of
> > > index scans now?
> > >
> > > I don't actually believe that the logic needs to change, but some kind
> > > of consolidation and streamlining seems like it might be helpful.
> > > Maybe just a comment that says "note that all index scans might just
> > > be no-ops because..." -- stuff like that.
> >
> > What do you mean by the last item and what report? I think
> > lazy_vacuum_table_and_indexes() itself doesn't report anything and
> > vacrelstats->num_index_scan counts the total number of index scans.
> >
> > >
> > > * Any idea about how hard it will be to teach is_wraparound VACUUMs to
> > > skip index cleanup automatically, based on some practical/sensible
> > > criteria?
> >
> > One simple idea would be to have a to-prevent-wraparound autovacuum
> > worker disables index cleanup (i.g., setting VACOPT_TERNARY_DISABLED
> > to index_cleanup). But a downside (but not a common case) is that
> > since a to-prevent-wraparound vacuum is not necessarily an aggressive
> > vacuum, it could skip index cleanup even though it cannot move
> > relfrozenxid forward.
> >
> > >
> > > It would be nice to have a basic PoC for that, even if it remains a
> > > PoC for the foreseeable future (i.e. even if it cannot be committed to
> > > Postgres 14). This feature should definitely be something that your
> > > patch series *enables*. I'd feel good about having covered that
> > > question as part of this basic design work if there was a PoC. That
> > > alone should make it 100% clear that it's easy to do (or no harder
> > > than it should be -- it should ideally be compatible with your basic
> > > design). The exact criteria that we use for deciding whether or not to
> > > skip index cleanup (which probably should not just be "this VACUUM is
> > > is_wraparound, good enough" in the final version) may need to be
> > > debated at length on pgsql-hackers. Even still, it is "just a detail"
> > > in the code. Whereas being *able* to do that with your design (now or
> > > in the future) seems essential now.
> >
> > Agreed. I'll write a PoC patch for that.
> >
> > >
> > > > * Store the answers to amvacuumstrategy into either the local memory
> > > > or DSM (in parallel vacuum case) so that ambulkdelete can refer the
> > > > answer to amvacuumstrategy.
> > > > * Fix regression failures.
> > > > * Update the documentation and commments.
> > > >
> > > > Note that 0003 patch is still PoC quality, lacking the btree meta page
> > > > version upgrade.
> > >
> > > This patch is not the hard part, of course -- there really isn't that
> > > much needed here compared to vacuumlazy.c. So this patch seems like
> > > the simplest 1 out of the 3 (at least to me).
> > >
> > > Some feedback on the third patch:
> > >
> > > * The new btm_last_deletion_nblocks metapage field should use P_NONE
> > > (which is 0) to indicate never having been vacuumed -- not
> > > InvalidBlockNumber (which is 0xFFFFFFFF).
> > >
> > > This is more idiomatic in nbtree, which is nice, but it has a very
> > > significant practical advantage: it ensures that every heapkeyspace
> > > nbtree index (i.e. those on recent nbtree versions) can be treated as
> > > if it has the new btm_last_deletion_nblocks field all along, even when
> > > it actually built on Postgres 12 or 13. This trick will let you avoid
> > > dealing with the headache of bumping BTREE_VERSION, which is a huge
> > > advantage.
> > >
> > > Note that this is the same trick I used to avoid bumping BTREE_VERSION
> > > when the btm_allequalimage field needed to be added (for the nbtree
> > > deduplication feature added to Postgres 13).
> > >
> >
> > That's a nice way with a great advantage. I'll use P_NONE.
> >
> > > * Forgot to do this in the third patch (think I made this same mistake
> > > once myself):
> > >
> > > diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
> > > index 8bb180bbbe..88dfea9af3 100644
> > > --- a/contrib/pageinspect/btreefuncs.c
> > > +++ b/contrib/pageinspect/btreefuncs.c
> > > @@ -653,7 +653,7 @@ bt_metap(PG_FUNCTION_ARGS)
> > >      BTMetaPageData *metad;
> > >      TupleDesc   tupleDesc;
> > >      int         j;
> > > -    char       *values[9];
> > > +    char       *values[10];
> > >      Buffer      buffer;
> > >      Page        page;
> > >      HeapTuple   tuple;
> > > @@ -734,6 +734,11 @@ bt_metap(PG_FUNCTION_ARGS)
> >
> > Check.
> >
> > I'm updating and testing the patch. I'll submit the updated version
> > patches tomorrow.
> >
>
> Sorry for the late.
>
> I've attached the updated version patch that incorporated the comments
> I got so far.
>
> I merged the previous 0001 and 0002 patches. 0003 patch is now another
> PoC patch that disables index cleanup automatically when autovacuum is
> to prevent xid-wraparound and an aggressive vacuum.

Since I found some bugs in the v3 patch I attached the updated version patches.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment

pgsql-hackers by date:

Previous
From: Guillaume Lelarge
Date:
Subject: Re: Extensions not dumped when --schema is used
Next
From: Bharath Rupireddy
Date:
Subject: Re: Is Recovery actually paused?