Re: Eager page freeze criteria clarification - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Eager page freeze criteria clarification |
Date | |
Msg-id | CAH2-WznLpXJg-aoUjo9ewWgJVqWTzYSTFUT_BBRKHt0iSjxvrA@mail.gmail.com Whole thread Raw |
In response to | Re: Eager page freeze criteria clarification (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Fri, Sep 29, 2023 at 11:27 AM Robert Haas <robertmhaas@gmail.com> wrote: > > Even if you're willing to assume that vacuum_freeze_min_age isn't just > > an arbitrary threshold, this still seems wrong. vacuum_freeze_min_age > > is applied by VACUUM, at the point that it scans pages. If VACUUM were > > infinitely fast, and new VACUUMs were launched constantly, then > > vacuum_freeze_min_age (and this bucketing scheme) might make more > > sense. But, you know, they're not. So whether or not VACUUM (with > > Andres' algorithm) deems a page that it has frozen to have been > > opportunistically frozen or not is greatly influenced by factors that > > couldn't possibly be relevant. > > I'm not totally sure that I'm understanding what you're concerned > about here, but I *think* that the issue you're worried about here is: > if we have various rules that can cause freezing, let's say X Y and Z, > and we adjust the aggressiveness of rule X based on the performance of > rule Y, that would be stupid and might suck. Your summary is pretty close. There are a couple of specific nuances to it, though: 1. Anything that uses XID age or even LSN age necessarily depends on when VACUUM shows up, which itself depends on many other random things. With small to medium sized tables that don't really grow, it's perhaps reasonable to expect this to not matter. But with tables like the TPC-C order/order lines table, or even pgbench_history, the next VACUUM operation will reliably be significantly longer and more expensive than the last one, forever (ignoring the influence of aggressive mode, and assuming typical autovacuum settings). So VACUUMs get bigger and less frequent as the table grows. As the table continues to grow, at some point we reach a stage where many XIDs encountered by VACUUM will be significantly older than vacuum_freeze_min_age, while others will be significantly younger. And so whether we apply the vacuum_freeze_min_age rule (or some other age based rule) is increasingly a matter of random happenstance (i.e. is more and more due to when VACUUM happens to show up), and has less and less to do with what the workload signals we should do. This is a moving target, but (if I'm not mistaken) under the scheme described by Andres we're not even trying to compensate for that. Separately, I have a practical concern: 2. It'll be very hard to independently track the effectiveness of rules X, Y, and Z as a practical matter, because the application of each rule quite naturally influences the application of every other rule over time. They simply aren't independent things in any practical sense. Even if this wasn't an issue, I can't think of a reasonable cost model. Is it good or bad if "opportunistic freezing" results in unfreezing 50% of the time? AFAICT that's an *extremely* complicated question. You cannot just interpolate from the 0% case (definitely good) and the 100% case (definitely bad) and expect to get a sensible answer. You can't split the difference -- even if we allow ourselves to ignore tricky value judgement type questions. > Assuming that the previous sentence is a correct framing, let's take X > to be "freezing based on the page LSN age" and Y to be "freezing based > on vacuum_freeze_min_age". I think the problem scenario here would be > if it turns out that, under some set of circumstances, Y freezes more > aggressively than X. For example, suppose the user runs VACUUM FREEZE, > effectively setting vacuum_freeze_min_age=0 for that operation. If the > table is being modified at all, it's likely to suffer a bunch of > unfreezing right afterward, which could cause us to decide to make > future vacuums freeze less aggressively. That's not necessarily what > we want, because evidently the user, at least at that moment in time, > thought that previous freezing hadn't been aggressive enough. They > might be surprised to find that flash-freezing the table inhibited > future automatic freezing. I didn't think of that one myself, but it's a great example. > Or suppose that they just have a very high XID consumption rate > compared to the rate of modifications to this particular table, such > that criteria related to vacuum_freeze_min_age tend to be satisfied a > lot, and thus vacuums tend to freeze a lot no matter what the page LSN > age is. This scenario actually doesn't seem like a problem, though. In > this case the freezing criterion based on page LSN age is already not > getting used, so it doesn't really matter whether we tune it up or > down or whatever. It would have to be a smaller table, which I'm relatively unconcerned about. > > Okay then. I guess it's more accurate to say that we'll have a strong > > bias in the direction of freezing when an FPI won't result, though not > > an infinitely strong bias. We'll at least have something that can be > > thought of as an improved version of the FPI thing for 17, I think -- > > which is definitely significant progress. > > I do kind of wonder whether we're going to care about the FPI thing in > the end. I don't mind if we do. But I wonder if it will prove > necessary, or even desirable. Andres's algorithm requires a greater > LSN age to trigger freezing when an FPI is required than when one > isn't. But Melanie's test results seem to me to show that using a > small LSN distance freezes too much on pgbench_accounts-type workloads > and using a large one freezes too little on insert-only workloads. So > I'm currently feeling a lot of skepticism about how useful it is to > vary the LSN-distance threshold as a way of controlling the behavior. I'm skeptical of varying the LSN distance, but I'm not skeptical of the idea of caring about FPIs in general. I wonder how much truly useful work VACUUM performed for pgbench_accounts during Melanie's performance evaluation -- leaving freezing aside. For the "too much freezing for pgbench_accounts" case, where master performed better than the patch, would it have been possible to do even better than that by simply turning off autovacuum? Or at least increasing the scale factor that triggers autovacuuming? (The answer will depend to some extent on heap fill factor.) -- Peter Geoghegan
pgsql-hackers by date: