Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: New IndexAM API controlling index vacuum strategies
Date
Msg-id CAH2-WzmJC3KNYE+tCpHogy5UbgjJWDGw3hQD1OvoQD2n6y+Yfg@mail.gmail.com
Whole thread Raw
In response to Re: New IndexAM API controlling index vacuum strategies  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: New IndexAM API controlling index vacuum strategies  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: New IndexAM API controlling index vacuum strategies  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Mar 17, 2021 at 7:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Since I was thinking that always skipping index vacuuming on
> anti-wraparound autovacuum is legitimate, skipping index vacuuming
> only when we're really close to the point of going into read-only mode
> seems a bit conservative, but maybe a good start. I've attached a PoC
> patch to disable index vacuuming if the table's relfrozenxid is too
> older than autovacuum_freeze_max_age (older than 1.5x of
> autovacuum_freeze_max_age).

Most anti-wraparound VACUUMs are really not emergencies, though. So
treating them as special simply because they're anti-wraparound
vacuums doesn't seem like the right thing to do. I think that we
should dynamically decide to do this when (antiwraparound) VACUUM has
already been running for some time. We need to delay the decision
until it is almost certainly true that we really have an emergency.

Can you take what you have here, and make the decision dynamic? Delay
it until we're done with the first heap scan? This will require
rebasing on top of the patch I posted. And then adding a third patch,
a little like the second patch -- but not too much like it.

In the second/SKIP_VACUUM_PAGES_RATIO patch I posted today, the
function two_pass_strategy() (my new name for the main entry point for
calling lazy_vacuum_all_indexes() and lazy_vacuum_heap()) is only
willing to perform the "skip index vacuuming" optimization when the
call to two_pass_strategy() is the first call and the last call for
that entire VACUUM (plus we test the number of heap blocks with
LP_DEAD items using SKIP_VACUUM_PAGES_RATIO, of course). It works this
way purely because I don't think that we should be aggressive when
we've already run out of maintenance_work_mem. That's a bad time to
apply a performance optimization.

But what you're talking about now isn't a performance optimization
(the mechanism is similar or the same, but the underlying reasons are
totally different) -- it's a safety/availability thing. I don't think
that you need to be concerned about running out of
maintenance_work_mem in two_pass_strategy() when applying logic that
is concerned about keeping the database online by avoiding XID
wraparound. You just need to have high confidence that it is a true
emergency. I think that we can be ~99% sure that we're in a real
emergency by using dynamic information about how old relfrozenxid is
*now*, and by rechecking a few times during VACUUM. Probably by
rechecking every time we call two_pass_strategy().

I now believe that there is no fundamental correctness issue with
teaching two_pass_strategy() to skip index vacuuming when we're low on
memory -- it is 100% a matter of costs and benefits. The core
skip-index-vacuuming mechanism is very flexible. If we can be sure
that it's a real emergency, I think that we can justify behaving very
aggressively (letting indexes get bloated is after all very
aggressive). We just need to be 99%+ sure that continuing with
vacuuming will be worse that ending vacuuming. Which seems possible by
making the decision dynamic (and revisiting it at least a few times
during a very long VACUUM, in case things change).

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Getting better results from valgrind leak tracking
Next
From: Amit Kapila
Date:
Subject: Re: Parallel INSERT (INTO ... SELECT ...)