Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date
Msg-id CA+TgmoZQtdd7Ob0HmwtURF=4TVNGWS0POTG5oz6rvi+EobyWuw@mail.gmail.com
Whole thread Raw
In response to Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Fri, Feb 4, 2022 at 3:31 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Application B will already block pruning by VACUUM operations against
> application A's table, and so effectively blocks recording of the
> resultant free space in the FSM in your scenario. And so application A
> and application B should be considered the same application already.
> That's just how VACUUM works.

Sure ... but that also sucks. If we consider application A and
application B to be the same application, then we're basing our
decision about what to do on information that is inaccurate.

> 5% larger seems like a lot more than would be typical, based on what
> I've seen. I don't think that the regression in this scenario can be
> characterized as "infinitely worse", or anything like it. On a long
> enough timeline, the potential upside of something like this is nearly
> unlimited -- it could avoid a huge amount of write amplification. But
> the potential downside seems to be small and fixed -- which is the
> point (bounding the downside). The mere possibility of getting that
> big benefit (avoiding the costs from heap fragmentation) is itself a
> benefit, even when it turns out not to pay off in your particular
> case. It can be seen as insurance.

I don't see it that way. There are cases where avoiding writes is
better, and cases where trying to cram everything into the fewest
possible ages is better. With the right test case you can make either
strategy look superior. What I think your test case has going for it
is that it is similar to something that a lot of people, really a ton
of people, actually do with PostgreSQL. However, it's not going to be
an accurate model of what everybody does, and therein lies some
element of danger.

> Then what could you have confidence in?

Real-world experience. Which is hard to get if we don't ever commit
any patches, but a good argument for (a) having them tested by
multiple different hackers who invent test cases independently and (b)
some configurability where we can reasonably include it, so that if
anyone does experience problems they have an escape.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: should vacuum's first heap pass be read-only?
Next
From: Thomas Munro
Date:
Subject: Re: Windows now has fdatasync()