Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations |
Date | |
Msg-id | CAH2-WzmUXPHpS4VPqVz7VLUkxSEy=F0bJ=2B-yBp7r1J75oHrg@mail.gmail.com Whole thread Raw |
In response to | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
|
List | pgsql-hackers |
On Fri, Feb 4, 2022 at 2:45 PM Robert Haas <robertmhaas@gmail.com> wrote: > While I agree that there's some case to be made for leaving settled > pages well enough alone, your criterion for settled seems pretty much > accidental. I fully admit that I came up with the FSM heuristic with TPC-C in mind. But you have to start somewhere. Fortunately, the main benefit of this patch series (avoiding the freeze cliff during anti-wraparound VACUUMs, often avoiding anti-wraparound VACUUMs altogether) don't depend on the experimental FSM patch at all. I chose to post that now because it seemed to help with my more general point about qualitatively different pages, and freezing at the page level. > Imagine a system where there are two applications running, > A and B. Application A runs all the time and all the transactions > which it performs are short. Therefore, when a certain page is not > modified by transaction A for a short period of time, the page will > become all-visible and will be considered settled. Application B runs > once a month and performs various transactions all of which are long, > perhaps on a completely separate set of tables. While application B is > running, pages take longer to settle not only for application B but > also for application A. It doesn't make sense to say that the > application is in control of the behavior when, in reality, it may be > some completely separate application that is controlling the behavior. Application B will already block pruning by VACUUM operations against application A's table, and so effectively blocks recording of the resultant free space in the FSM in your scenario. And so application A and application B should be considered the same application already. That's just how VACUUM works. VACUUM isn't a passive observer of the system -- it's another participant. It both influences and is influenced by almost everything else in the system. > I can see that this could have significant advantages under some > circumstances. But I think it could easily be far worse under other > circumstances. I mean, you can have workloads where you do some amount > of read-write work on a table and then go read only and sequential > scan it an infinite number of times. An algorithm that causes the > table to be smaller at the point where we switch to read-only > operations, even by a modest amount, wins infinitely over anything > else. But even if you have no change in the access pattern, is it a > good idea to allow the table to be, say, 5% larger if it means that > correlated data is colocated? In general, probably yes. If that means > that the table fails to fit in shared_buffers instead of fitting, no. > If that means that the table fails to fit in the OS cache instead of > fitting, definitely no. 5% larger seems like a lot more than would be typical, based on what I've seen. I don't think that the regression in this scenario can be characterized as "infinitely worse", or anything like it. On a long enough timeline, the potential upside of something like this is nearly unlimited -- it could avoid a huge amount of write amplification. But the potential downside seems to be small and fixed -- which is the point (bounding the downside). The mere possibility of getting that big benefit (avoiding the costs from heap fragmentation) is itself a benefit, even when it turns out not to pay off in your particular case. It can be seen as insurance. > And to me, that kind of effect is why it's hard to gain much > confidence in regards to stuff like this via laboratory testing. I > mean, I'm glad you're doing such tests. But in a laboratory test, you > tend not to have things like a sudden and complete change in the > workload, or a random other application sometimes sharing the machine, > or only being on the edge of running out of memory. I think in general > people tend to avoid such things in benchmarking scenarios, but even > if include stuff like this, it's hard to know what to include that > would be representative of real life, because just about anything > *could* happen in real life. Then what could you have confidence in? -- Peter Geoghegan
pgsql-hackers by date: