Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date
Msg-id CAH2-Wz=O+YW_3xh1S+8ZMEo9Aoj+J=y4fRevKVxkGKPxsfc1Fg@mail.gmail.com
Whole thread Raw
In response to Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Mar 1, 2022 at 1:46 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I think that this is not really a description of an algorithm -- and I
> think that it is far from clear that the third "in-between" category
> does not need to exist.

But I already described the algorithm. It is very simple
mechanistically -- though that in itself means very little. As I have
said multiple times now, the hard part is assessing what the
implications are. And the even harder part is making a judgement about
whether or not those implications are what we generally want.

> I think findings like this are very unconvincing.

TPC-C may be unrealistic in certain ways, but it is nevertheless
vastly more realistic than pgbench. pgbench is really more of a stress
test than a benchmark.

The main reasons why TPC-C is interesting here are *very* simple, and
would likely be equally true with TPC-E (just for example) -- even
though TPC-E is a very different benchmark kind of OLTP workload
overall. TPC-C (like TPC-E) features a diversity of transaction types,
some of which are more complicated than others -- which is strictly
more realistic than having only one highly synthetic OLTP transaction
type. Each transaction type doesn't necessarily modify the same tables
in the same way. This leads to natural diversity among tables and
among transactions, including:

* The typical or average number of distinct XIDs per heap page varies
significantly among each table. There are way fewer distinct XIDs per
"order line" table heap page than there are per "order" table heap
page, for the obvious reason.

* Roughly speaking, there are various different ways that free space
management ought to work in a system like Postgres. For example it is
necessary to make a "fragmentations vs space utilization" trade-off
with the new orders table.

* There are joins in some of the transactions!

Maybe TPC-C is a crude approximation of reality, but it nevertheless
exercises relevant parts of the system to a significant degree. What
else would you expect me to use, for a project like this? To a
significant degree the relfrozenxid tracking stuff is interesting
because tables tend to have natural differences like the ones I have
highlighted on this thread. How could that not be the case? Why
wouldn't we want to take advantage of that?

There might be some danger in over-optimizing for this particular
benchmark, but right now that is so far from being the main problem
that the idea seems strange to me. pgbench doesn't need the FSM, at
all. In fact pgbench doesn't even really need VACUUM (except for
antiwraparound), once heap fillfactor is lowered to 95 or so. pgbench
simply isn't relevant, *at all*, except perhaps as a way of measuring
regressions in certain synthetic cases that don't benefit.

> TPC-C (or any
> benchmark really) is so simple as to be a terrible proxy for what
> vacuuming is going to look like on real-world systems.

Doesn't that amount to "no amount of any kind of testing or
benchmarking will convince me of anything, ever"?

There is more than one type of real-world system. I think that TPC-C
is representative of some real world systems in some regards. But even
that's not the important point for me. I find TPC-C generally
interesting for one reason: I can clearly see that Postgres does
things in a way that just doesn't make much sense, which isn't
particularly fundamental to how VACUUM works.

My only long term goal is to teach Postgres to *avoid* various
pathological cases exhibited by TPC-C (e.g., the B-Tree "split after
new tuple" mechanism from commit f21668f328 *avoids* a pathological
case from TPC-C). We don't necessarily have to agree on how important
each individual case is "in the real world" (which is impossible to
know anyway). We only have to agree that what we see is a pathological
case (because some reasonable expectation is dramatically violated),
and then work out a fix.

I don't want to teach Postgres to be clever -- I want to teach it to
avoid being stupid in cases where it exhibits behavior that really
cannot be described any other way. You seem to talk about some of this
work as if it was just as likely to have a detrimental effect
elsewhere, for some equally plausible workload, which will have a
downside that is roughly as bad as the advertised upside. I consider
that very unlikely, though. Sure, regressions are quite possible, and
a real concern -- but regressions *like that* are unlikely. Avoiding
doing what is clearly the wrong thing just seems to work out that way,
in general.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Next
From: Tatsuo Ishii
Date:
Subject: Re: Typo in pgbench messages.