Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date
Msg-id CAH2-WzmUXPHpS4VPqVz7VLUkxSEy=F0bJ=2B-yBp7r1J75oHrg@mail.gmail.com
Whole thread Raw
In response to Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Feb 4, 2022 at 2:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
> While I agree that there's some case to be made for leaving settled
> pages well enough alone, your criterion for settled seems pretty much
> accidental.

I fully admit that I came up with the FSM heuristic with TPC-C in
mind. But you have to start somewhere.

Fortunately, the main benefit of this patch series (avoiding the
freeze cliff during anti-wraparound VACUUMs, often avoiding
anti-wraparound VACUUMs altogether) don't depend on the experimental
FSM patch at all. I chose to post that now because it seemed to help
with my more general point about qualitatively different pages, and
freezing at the page level.

> Imagine a system where there are two applications running,
> A and B. Application A runs all the time and all the transactions
> which it performs are short. Therefore, when a certain page is not
> modified by transaction A for a short period of time, the page will
> become all-visible and will be considered settled. Application B runs
> once a month and performs various transactions all of which are long,
> perhaps on a completely separate set of tables. While application B is
> running, pages take longer to settle not only for application B but
> also for application A. It doesn't make sense to say that the
> application is in control of the behavior when, in reality, it may be
> some completely separate application that is controlling the behavior.

Application B will already block pruning by VACUUM operations against
application A's table, and so effectively blocks recording of the
resultant free space in the FSM in your scenario. And so application A
and application B should be considered the same application already.
That's just how VACUUM works.

VACUUM isn't a passive observer of the system -- it's another
participant. It both influences and is influenced by almost everything
else in the system.

> I can see that this could have significant advantages under some
> circumstances. But I think it could easily be far worse under other
> circumstances. I mean, you can have workloads where you do some amount
> of read-write work on a table and then go read only and sequential
> scan it an infinite number of times. An algorithm that causes the
> table to be smaller at the point where we switch to read-only
> operations, even by a modest amount, wins infinitely over anything
> else. But even if you have no change in the access pattern, is it a
> good idea to allow the table to be, say, 5% larger if it means that
> correlated data is colocated? In general, probably yes. If that means
> that the table fails to fit in shared_buffers instead of fitting, no.
> If that means that the table fails to fit in the OS cache instead of
> fitting, definitely no.

5% larger seems like a lot more than would be typical, based on what
I've seen. I don't think that the regression in this scenario can be
characterized as "infinitely worse", or anything like it. On a long
enough timeline, the potential upside of something like this is nearly
unlimited -- it could avoid a huge amount of write amplification. But
the potential downside seems to be small and fixed -- which is the
point (bounding the downside). The mere possibility of getting that
big benefit (avoiding the costs from heap fragmentation) is itself a
benefit, even when it turns out not to pay off in your particular
case. It can be seen as insurance.

> And to me, that kind of effect is why it's hard to gain much
> confidence in regards to stuff like this via laboratory testing. I
> mean, I'm glad you're doing such tests. But in a laboratory test, you
> tend not to have things like a sudden and complete change in the
> workload, or a random other application sometimes sharing the machine,
> or only being on the edge of running out of memory. I think in general
> people tend to avoid such things in benchmarking scenarios, but even
> if include stuff like this, it's hard to know what to include that
> would be representative of real life, because just about anything
> *could* happen in real life.

Then what could you have confidence in?

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: make MaxBackends available in _PG_init
Next
From: Swaha Miller
Date:
Subject: Re: support for CREATE MODULE