Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: New IndexAM API controlling index vacuum strategies
Date
Msg-id CAH2-WznCxtWL4B995y2KJWj-+jrjahH4n6gD2R74SyQJo6Y63w@mail.gmail.com
Whole thread Raw
In response to Re: New IndexAM API controlling index vacuum strategies  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: New IndexAM API controlling index vacuum strategies  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
On Mon, Apr 5, 2021 at 2:44 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> This assertion is false, and should be a guarding if-statement. HOT
> redirect pointers are not updated if the tuple they're pointing to is
> vacuumed (i.e. when it was never committed) so this nextoffnum might
> in a correctly working system point past maxoff.

I will need to go through this in detail soon.

> > Line pointer truncation doesn't happen during pruning, as it did in
> > Matthias' original patch. In this revised version, line pointer
> > truncation occurs during the second phase of VACUUM. There are several
> > reasons to prefer this approach. It seems both safer and more useful
> > that way (compared to the approach of doing line pointer truncation
> > during pruning). It also makes intuitive sense to do it this way, at
> > least to me -- the second pass over the heap is supposed to be for
> > "freeing" LP_DEAD line pointers.
>
> Good catch for running a line pointer truncating pass at the second
> pass over the heap in VACUUM, but I believe that it is also very
> useful for pruning. Line pointer bloat due to excessive HOT chains
> cannot be undone until the 2nd run of VACUUM happens with this patch,
> which is a missed chance for all non-vacuum pruning.

Maybe - I have my doubts about it having much value outside of the
more extreme cases. But let's assume that I'm wrong about that, for
the sake of argument.

The current plan is to no longer require a super-exclusive lock inside
lazy_vacuum_heap_page(), which means that we can no longer safely call
PageRepairFragmentation() at that point. This will mean that
PageRepairFragmentation() is 100% owned by pruning. And so the
question of whether or not line pointer truncation should also happen
in PageRepairFragmentation() to cover pruning is (or will be) a
totally separate question to the question of how
lazy_vacuum_heap_page() does it. Nothing stops you from independently
pursuing that as a project for Postgres 15.

> What difference is there between opportunistically pruned HOT line
> pointers, and VACUUMed line pointers?

The fact that they are physically identical to each other isn't
everything. The "life cycle" of an affected page is crucially
important.

I find that there is a lot of value in thinking about how things look
at the page level moment to moment, and even over hours and days.
Usually with a sample workload and table in mind. I already mentioned
the new_order table from TPC-C, which is characterized by continual
churn from more-or-less even amounts of range deletes and bulk inserts
over time. That seems to be the kind of workload where you see big
problems with line pointer bloat. Because there is constant churn of
unrelated logical rows (it's not a bunch of UPDATEs).

It's possible for very small effects to aggregate into large and
significant effects -- I know this from my experience with indexing.
Plus the FSM is probably not very smart about fragmentation, which
makes it even more complicated. And so it's easy to be wrong if you
predict that some seemingly insignificant extra intervention couldn't
possibly help. For that reason, I don't want to predict that you're
wrong now. It's just a question of time, and of priorities.

> Truncating during pruning has
> the benefit of keeping the LP array short where possible, and seeing
> that truncating the LP array allows for more applied
> PD_HAS_FREE_LINES-optimization, I fail to see why you wouldn't want to
> truncate the LP array whenever clearing up space.

Truncating the line pointer array is not an intrinsic good. I hesitate
to do it during pruning in the absence of clear evidence that it's
independently useful. Pruning is a very performance sensitive
operation. Much more so than VACUUM's second heap pass.

> Other than those questions, some comments on the other patches:
>
> 0002:
> > +    appendStringInfo(&buf, _("There were %lld dead item identifiers.\n"),
> > +                     (long long) vacrel->lpdead_item_pages);
>
> I presume this should use vacrel->lpdead_items?.

It should have been, but as it happens I have decided to not do this
at all in 0002-*. Better to not report on LP_UNUSED *or* LP_DEAD items
at this point of VACUUM VERBOSE output.

> 0003:
> > + * ...  Aborted transactions
> > + * have tuples that we can treat as DEAD without caring about where there
> > + * tuple header XIDs ...
>
> This should be '... where their tuple header XIDs...'

Will fix.

> > +retry:
> > +
> > ...
> > +        res = HeapTupleSatisfiesVacuum(&tuple, vacrel->OldestXmin, buf);
> > +
> > +        if (unlikely(res == HEAPTUPLE_DEAD))
> > +            goto retry;
>
> In this unlikely case, you reset the tuples_deleted value that was
> received earlier from heap_page_prune. This results in inaccurate
> statistics, as repeated calls to heap_page_prune on the same page will
> not count tuples that were deleted in a previous call.

I don't think that it matters. The "tupgone=true" case has no test
coverage (see coverage.postgresql.org), and it would be hard to ensure
that the "res == HEAPTUPLE_DEAD" that replaces it gets coverage, for
the same reasons. Keeping the rules as simple as possible seem like a
good goal. What's more, it's absurdly unlikely that this will happen
even once. The race is very tight. Postgres will do opportunistic
pruning at almost any point, often from a SELECT, so the chances of
anybody noticing an inaccuracy from this issue in particular are
remote in the extreme.

Actually, a big problem with the tuples_deleted value surfaced by both
log_autovacuum and by VACUUM VERBOSE is that it can be wildly
different to the number of LP_DEAD items. This is commonly the case
with tables that get lots of non-HOT updates, with opportunistic
pruning kicking in a lot, with LP_DEAD items constantly accumulating.
By the time VACUUM comes around, it reports an absurdly low
tuples_deleted because it's using this what-I-pruned-just-now
definition. The opposite extreme is also possible, since there might
be far fewer LP_DEAD items when VACUUM does a lot of pruning of HOT
chains specifically.

> 0004:
> > +     * truncate to.  Note that we avoid truncating the line pointer to 0 items
> > +     * in all cases.
> > +     */
>
> Is there a specific reason that I'm not getting as to why this is necessary?

I didn't say it was strictly necessary. There is special-case handling
of PageIsEmpty() at various points, though, including within VACUUM.
It seemed worth avoiding hitting that. Perhaps I should change it to
not work that way.

> 0005:
> > +        The default is 1.8 billion transactions. Although users can set this value
> > +        anywhere from zero to 2.1 billion, <command>VACUUM</command> will silently
> > +        adjust the effective value more than 105% of
> > +        <xref linkend="guc-autovacuum-freeze-max-age"/>, so that only
> > +        anti-wraparound autovacuums and aggressive scans have a chance to skip
> > +        index cleanup.
>
> This documentation doesn't quite make it clear what its relationship
> is with autovacuum_freeze_max_age. How about the following: "...
> >VACUUM< will use the higher of this value and 105% of
> >guc-autovacuum-freeze-max-age<, so that only ...". It's only slightly
> easier to read, but at least it conveys that values lower than 105% of
> autovacuum_freeze_max_age are not considered. The same can be said for
> the multixact guc documentation.

This does need work too.

I'm going to push 0002- and 0003- tomorrow morning pacific time. I'll
publish a new set of patches tomorrow, once I've finished that up. The
last 2 patches will require a lot of focus to get over the line for
Postgres 14.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Asynchronous Append on postgres_fdw nodes.
Next
From: Masahiko Sawada
Date:
Subject: Re: Table refer leak in logical replication