Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: New IndexAM API controlling index vacuum strategies
Date
Msg-id CAH2-WznrGu+Xz9QqZBjrsoHNnYu62YT2GobEnO9k0JeGozZWjA@mail.gmail.com
Whole thread Raw
In response to Re: New IndexAM API controlling index vacuum strategies  (Andres Freund <andres@anarazel.de>)
Responses Re: New IndexAM API controlling index vacuum strategies  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Apr 14, 2021 at 6:53 PM Andres Freund <andres@anarazel.de> wrote:
> > To a large degree the failsafe is something that is written in the
> > hope that it will never be needed. This is unlike most other things,
> > and has its own unique risks.
>
> Among them that the code is not covered by tests and is unlikely to be
> meaningfully exercised within the beta timeframe due to the timeframes
> for hitting it (hard to actually hit below a 1/2 day running extreme
> workloads, weeks for more realistic ones). Which means that this code
> has to be extra vigorously reviewed, not the opposite.

There is test coverage for the optimization to bypass index vacuuming
with very few dead tuples. Plus we can expect it to kick in very
frequently. That's not as good as testing this other mechanism
directly, which I agree ought to happen too. But the difference
between those two cases is pretty much entirely around when and how
they kick in. I have normalized the idea that index vacuuming is
optional in principle, so in an important sense it is tested all the
time.

> Or at least
> tests for it should be added (pg_resetwal + autovacuum_naptime=1s or
> such should make that doable, or even just running a small test with
> lower thresholds).

You know what else doesn't have test coverage? Any kind of aggressive
VACUUM. There is a problem with our culture around testing. I would
like to address that in the scope of this project, but you know how it
is. Can I take it that I'll have your support with adding those tests?

> This line of argumentation scares me. Not explained arguments, about
> running in conditions that we otherwise don't run in, when in
> exceptional circumstances. This code has a history of super subtle
> interactions, with quite a few data loss causing bugs due to us not
> forseeing some combination of circumstances.

I'll say it again: I was wrong to not make that clearer prior to
committing the fixup. I regret that error, which probably had a lot to
do with being fatigued.

> I think there are good arguments for having logic for an "emergency
> vacuum" mode (and also some good ones against). I'm not convinced that
> the current set of things that are [not] skipped in failsafe mode is the
> "obviously right set of things"™ but am convinced that there wasn't
> enough consensus building o what that set of things is. This all also
> would be different if it were the start of the development window,
> rather than the end.

I all but begged you to review the patches. Same with Robert. While
the earlier patches (where almost all of the complexity is) did get
review from both you and Robert (which I was grateful to receive), for
whatever reason neither of you looked at the later patches in detail.
(Robert said that the failsafe ought to cover single-pass/no-indexes
VACUUM at one point, which did influence the design of the failsafe,
but for the most part his input on the later stuff was minimal and
expressed in general terms.)

Of course, nothing stops us from improving the mechanism in the
future. Though I maintain that the fundamental approach of finishing
as quickly as possible is basically sound (short of fixing the problem
directly, for example by obviating the need for freezing).

> In my experience the big problem with vacuums in a wraparound situation
> isn't actually things like truncation or eventhe index scans (although
> they certainly can cause bad problems), but that VACUUM modifies
> (prune/vacuum and WAL log or just setting hint bits) a crapton of pages
> that don't actually need to be modified just to be able to get out of
> the wraparound situation.

Things like UUID indexes are very popular, and are likely to have an
outsized impact on dirtying pages (which I agree is the real problem).
Plus some people just have a ridiculous amount of indexes (e.g., the
Discourse table that they pointed out as a particularly good target
for deduplication had a total of 13 indexes). There is an excellent
chance that stuff like that is involved in installations that actually
have huge problems. The visibility map works pretty well these days --
but not for indexes.

> And that the overhead of writing out all those
> dirty pages + WAL logging causes the VACUUM to take unacceptably
> long. E.g. because your storage is cloud storage with a few ms of
> latency, and the ringbuffer + wal_buffer sizes cause so many synchronous
> writes that you end up with < 10MB/s of data being processed.

This is a false dichotomy. There probably is an argument for making
the failsafe not do pruning that isn't strictly necessary (or
something like that) in a future release. I don't see what particular
significance that has for the failsafe mechanism now. The sooner we
can advance relfrozenxid when it's dangerously far in the past, the
better. It's true that the mechanism doesn't exploit every possible
opportunity to do so. But it mostly manages to do that.

> I think there's also a clear danger in having "cliffs" where the
> behaviour changes appruptly once a certain threshold is reached. It's
> not unlikely for systems to fall over entirely over when
>
> a) autovacuum cost limiting is disabled. E.g. reaching your disk
>    iops/throughput quota and barely being able to log into postgres
>    anymore to kill the stuck connection causing the wraparound issue.

Let me get this straight: You're concerned that hurrying up vacuuming
when we have 500 million XIDs left to burn will overwhelm the system,
which would presumably have finished in time otherwise? Even though it
would have to do way more work in absolute terms in the absence of the
failsafe? And even though the 1.6 billion XID age that we got to
before the failsafe kicked in was clearly not enough? You'd want to
"play it safe", and stick with the original plan at that point?

> b) No index cleanup happens anymore. E.g. a workload with a lot of
>    bitmap index scans (which do not support killtuples) could end up a
>    off a lot worse because index pointers to dead tuples aren't being
>    cleaned up. In cases where an old transaction or leftover replication
>    slot is causing the problem (together a significant percentage of
>    wraparound situations) this situation will persist across repeated
>    (explicit or automatic) vacuums for a table, because relfrozenxid
>    won't actually be advanced. And this in turn might actually end up
>    slowing resolution of the wraparound issue more than doing all the
>    index scans.

If it's intrinsically impossible to advance relfrozenxid, then surely
all bets are off. But even in this scenario it's very unlikely that we
wouldn't at least do index vacuuming for those index tuples that are
dead and safe to delete according to the OldestXmin cutoff. You still
have 1.6 billion XIDs before the failsafe first kicks in, regardless
of the issue of the OldestXmin/FreezeLimit being excessively far in
the past.

You're also not acknowledging the benefit of avoiding uselessly
scanning the indexes again and again, which is mostly what would be
happening in this scenario. Maybe VACUUM shouldn't spin like this at
all, but that's not a new problem.

> Because this is a hard cliff rather than something phasing in, it's not
> really possible to for a user to see this slowly getting worse and
> addressing the issue. Especially for a) this could be addressed by not
> turning off cost limiting at once, but instead have it decrease the
> closer you get to some limit.

There is a lot to be said for keeping the behavior as simple as
possible. You said so yourself. In any case I think that the perfect
should not be the enemy of the good (or the better, at least).

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
Next
From: Ajin Cherian
Date:
Subject: Re: logical replication empty transactions