Re: Eager page freeze criteria clarification - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Eager page freeze criteria clarification |
Date | |
Msg-id | CAH2-WzmbjyqCyYMXjTTRv-L79Kh3GDXdjwag8cyh37xgbp7Gmg@mail.gmail.com Whole thread Raw |
In response to | Re: Eager page freeze criteria clarification (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Eager page freeze criteria clarification
|
List | pgsql-hackers |
On Fri, Sep 29, 2023 at 7:55 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Sep 28, 2023 at 12:03 AM Peter Geoghegan <pg@bowt.ie> wrote: > > But isn't the main problem *not* freezing when we could and > > should have? (Of course the cost of freezing is very relevant, but > > it's still secondary.) > > Perhaps this is all in how you look at it, but I don't see it this > way. It's easy to see how to solve the "not freezing" problem: just > freeze everything as often as possible. If that were cheap enough that > we could just do it, then we'd just do it and be done here. The > problem is that, at least in my opinion, that seems too expensive in > some cases. I'm starting to believe that those cases are narrower than > I once thought, but I think they do exist. So now, I'm thinking that > maybe the main problem is identifying when you've got such a case, so > that you know when you need to be less aggressive. Assuming your concern is more or less limited to those cases where the same page could be frozen an unbounded number of times (once or almost once per VACUUM), then I think we fully agree. We ought to converge on the right behavior over time, but it's particularly important that we never converge on the wrong behavior instead. > > Won't the algorithm that you've sketched always think that > > "unfreezing" pages doesn't affect recently frozen pages with such a > > workload? Isn't the definition of "recently frozen" that emerges from > > this algorithm not in any way related to the order delivery time, or > > anything like that? You know, rather like vacuum_freeze_min_age. > > FWIW, I agree that vacuum_freeze_min_age sucks. I have been reluctant > to endorse changes in this area mostly because I fear replacing one > bad idea with another, not because I think that what we have now is > particularly good. It's better to be wrong in the same way in every > release than to have every release be equally wrong but in a different > way. > > Also, I think the question of what "recently frozen" means is a good > one, but I'm not convinced that it ought to relate to the order > delivery time. I don't think it should, either. The TPC-C scenario is partly interesting because it isn't actually obvious what the most desirable behavior is, even assuming that you had perfect information, and were not subject to practical considerations about the complexity of your algorithm. There doesn't seem to be perfect clarity on what the goal should actually be in such scenarios -- it's not like the problem is just that we can't agree on the best way to accomplish those goals with this specific workload. If performance/efficiency and performance stability are directly in tension (as they sometimes are), how much do you want to prioritize one or the other? It's not an easy question to answer. It's a value judgement as much as anything else. > If we insert into a table and 12-14 hours go buy before > it's updated, it doesn't seem particularly bad to me if we froze that > data meanwhile (regardless of what metric drove that freezing). Same > thing if it's 2-4 hours. What seems bad to me is if we're constantly > updating the table and vacuum comes sweeping through and freezing > everything to no purpose over and over again and then it gets > un-frozen a few seconds or minutes later. Right -- we agree here. I even think that it makes sense to freeze pages knowing for sure that the pages will be unfrozen on that sort of timeline (at least with a large and ever growing table like this). It may technically be less efficient, but not when you consider how everything else is affected by the disruptive impact of freezing a great deal of stuff all at once. (Of course it's also true that we don't really know what will happen, which is all the more reason to freeze eagerly.) > Now maybe that's the wrong idea. After all, as a percentage, the > overhead is the same either way, regardless of whether we're talking > about WAL volume or CPU cycles. But somehow it feels worse to make the > same mistakes every few minutes or potentially even tens of seconds > than it does to make the same mistakes every few hours. The absolute > cost is a lot higher. I agree. Another problem with the algorithm that Andres sketched is that it supposes that vacuum_freeze_min_age means something relevant -- that's how we decide whether or not freezing should count as "opportunistic". But there really isn't that much difference between (say) an XID age of 25 million and 50 million. At least not with a table like the TPC-C tables, where VACUUMs are naturally big operations that take place relatively infrequently. Assume a default vacuum_freeze_min_age of 50 million. How can it make sense to deem freezing a page "opportunistic" when its oldest XID has only attained an age of 25 million, if the subsequent unfreezing happens when that same XID would have attained an age of 75 million, had we not frozen it? And if you agree that it doesn't make sense, how can we compensate for this effect, as a practical matter? Even if you're willing to assume that vacuum_freeze_min_age isn't just an arbitrary threshold, this still seems wrong. vacuum_freeze_min_age is applied by VACUUM, at the point that it scans pages. If VACUUM were infinitely fast, and new VACUUMs were launched constantly, then vacuum_freeze_min_age (and this bucketing scheme) might make more sense. But, you know, they're not. So whether or not VACUUM (with Andres' algorithm) deems a page that it has frozen to have been opportunistically frozen or not is greatly influenced by factors that couldn't possibly be relevant. > > On a positive note, I like that what you've laid out freezes eagerly > > when an FPI won't result -- this much we can all agree on. I guess > > that that part is becoming uncontroversial. > > I don't think that we're going to be able to get away with freezing > rows in a small, frequently-updated table just because no FPI will > result. I think Melanie's results show that the cost is not > negligible. But Andres's pseudocode algorithm, although it is more > aggressive in that case, doesn't necessarily seem bad to me, because > it still has some ability to hold off freezing in such cases if our > statistics show that it isn't working out. Okay then. I guess it's more accurate to say that we'll have a strong bias in the direction of freezing when an FPI won't result, though not an infinitely strong bias. We'll at least have something that can be thought of as an improved version of the FPI thing for 17, I think -- which is definitely significant progress. -- Peter Geoghegan
pgsql-hackers by date: