Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date
Msg-id CAH2-WzkUy717a_mS10QNY=ryV3K37Y4a-2Xt+KWKZ=1jbVvU7A@mail.gmail.com
Whole thread Raw
In response to Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
List pgsql-hackers
On Mon, Feb 7, 2022 at 10:08 AM Robert Haas <robertmhaas@gmail.com> wrote:
> But ... if I'm not mistaken, in the kind of case that Greg is
> describing, relfrozenxid will be advanced exactly as often as it is
> today.

But what happens today in a scenario like Greg's is pathological,
despite being fairly common (common in large DBs). It doesn't seem
informative to extrapolate too much from current experience for that
reason.

> That's because, if VACUUM is only ever getting triggered by XID
> age advancement and not by bloat, there's no opportunity for your
> patch set to advance relfrozenxid any sooner than we're doing now.

We must distinguish between:

1. "VACUUM is fundamentally never going to need to run unless it is
forced to, just to advance relfrozenxid" -- this applies to tables
like the stock and customers tables from the benchmark.

and:

2. "VACUUM must sometimes run to mark newly appended heap pages
all-visible, and maybe to also remove dead tuples, but not that often
-- and yet we current only get expensive and inconveniently timed
anti-wraparound VACUUMs, no matter what" -- this applies to all the
other big tables in the benchmark, in particular to the orders and
order lines tables, but also to simpler cases like pgbench_history.

As I've said a few times now, the patch doesn't change anything for 1.
But Greg's problem tables very much sound like they're from category
2. And what we see with the master branch for such tables is that they
always get anti-wraparound VACUUMs, past a certain size (depends on
things like exact XID rate and VACUUM settings, the insert-driven
autovacuum scheduling stuff matters). While the patch never reaches
that point in practice, during my testing -- and doesn't come close.

It is true that in theory, as the size of ones of these "category 2"
tables tends to infinity, the patch ends up behaving the same as
master anyway. But I'm pretty sure that that usually doesn't matter at
all, or matters less than you'd think. As I emphasized when presenting
the recent v7 TPC-C benchmark, neither of the two "TPC-C big problem
tables" (which are particularly interesting/tricky examples of tables
from category 2) come close to getting an anti-wraparound VACUUM
(plus, as I said in the same email, wouldn't matter if they did).

> So I think that people in this kind of situation will potentially be
> helped or hurt by other things the patch set does, but the eager
> relfrozenxid stuff won't make any difference for them.

To be clear, I think it would if everything was in place, including
the basic relfrozenxid advancement thing, plus the new freezing stuff
(though you wouldn't need the experimental FSM thing to get this
benefit).

Here is a thought experiment that may make the general idea a bit clearer:

Imagine I reran the same benchmark as before, with the same settings,
and the expectation that everything would be the same as first time
around for the patch series. But to make things more interesting, this
time I add an adversarial element: I add an adversarial gizmo that
burns XIDs steadily, without doing any useful work. This gizmo doubles
the rate of XID consumption for the database as a whole, perhaps by
calling "SELECT txid_current()" in a loop, followed by a timed sleep
(with a delay chosen with the goal of doubling XID consumption). I
imagine that this would also burn CPU cycles, but probably not enough
to make more than a noise level impact -- so we're severely stressing
the implementation by adding this gizmo, but the stress is precisely
targeted at XID consumption and related implementation details. It's a
pretty clean experiment. What happens now?

I believe (though haven't checked for myself) that nothing important
would change. We'd still see the same VACUUM operations occur at
approximately the same times (relative to the start of the benchmark)
that we saw with the original benchmark, and each VACUUM operation
would do approximately the same amount of physical work on each
occasion. Of course, the autovacuum log output would show that the
OldestXmin for each individual VACUUM operation had larger values than
first time around for this newly initdb'd TPC-C database (purely as a
consequence of the XID burning gizmo), but it would *also* show
*concomitant* increases for our newly set relfrozenxid. The system
should therefore hardly behave differently at all compared to the
original benchmark run, despite this adversarial gizmo.

It's fair to wonder: okay, but what if it was 4x, 8x, 16x? What then?
That does get a bit more complicated, and we should get into why that
is. But for now I'll just say that I think that even that kind of
extreme would make much less difference than you might think -- since
relfrozenxid advancement has been qualitatively improved by the patch
series. It is especially likely that nothing would change if you were
willing to increase autovacuum_freeze_max_age to get a bit more
breathing room -- room to allow the autovacuums to run at their
"natural" times. You wouldn't necessarily have to go too far -- the
extra breathing room from increasing autovacuum_freeze_max_age buys
more wall clock time *between* any two successive "naturally timed
autovacuums". Again, a virtuous cycle.

Does that make sense? It's pretty subtle, admittedly, and you no doubt
have (very reasonable) concerns about the extremes, even if you accept
all that. I just want to get the general idea across here, as a
starting point for further discussion.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: should vacuum's first heap pass be read-only?
Next
From: Robert Haas
Date:
Subject: Re: Make relfile tombstone files conditional on WAL level