Re: Eager page freeze criteria clarification - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Eager page freeze criteria clarification |
Date | |
Msg-id | CAH2-Wzk9yFf2RkadQKJ4O_094_UxGz63y06kqCdptLN2aBDKNA@mail.gmail.com Whole thread Raw |
In response to | Re: Eager page freeze criteria clarification (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Wed, Sep 27, 2023 at 10:46 AM Andres Freund <andres@anarazel.de> wrote: > I don't disagree that we should do something in that direction - I just don't > see the raw number of unfrozen pages being useful in that regard. If you have > a database where no pages live long, we don't need to freeze > oppportunistically, yet the fraction of unfrozen pages will be huge. We don't know how to reliably predict the future. We can do our best to ameliorate problems with such workloads, using a slew of different strategies (including making the behaviors configurable, holding off on freezing a second or a third time, etc). Such workloads are not very common, and won't necessarily suffer too much. I think that they're pretty much limited to queue-like tables. Any course of action will likely have some downsides. Melanie/you/whomever will need to make a trade-off, knowing that somebody somewhere isn't going to be completely happy about it. I just don't think that anybody will ever be able to come up with an algorithm that can generalize well enough for things to not work out that way. I really care about the problems in this area being addressed more comprehensively, so I'm certainly not going to be the one that just refuses to accept a trade-off (within reason, of course). > > > If we want to take global freeze debt into account, which I think is a good > > > idea, we'll need a smarter way to represent the debt than just the number of > > > unfrozen pages. I think we would need to track the age of unfrozen pages in > > > some way. If there are a lot of unfrozen pages with a recent xid, then it's > > > fine, but if they are older and getting older, it's a problem and we need to > > > be more aggressive. > > > > Tables like pgbench_history will have lots of unfrozen pages with a > > recent XID that get scanned during every VACUUM. We should be freezing > > such pages at the earliest opportunity. > > I think we ought to be able to freeze tables with as simple a workload as > pgbench_history has aggressively without taking a global freeze debt into > account. Agreed. > We definitely *also* should take the number of unfrozen pages into account. I > just don't determining freeze debt primarily using the number of unfrozen > pages will be useful. The presence of unfrozen pages that are likely to be > updated again soon is not a problem and makes the simple metric pretty much > useless. I never said "primarily". > > To be clear, that doesn't mean that XID age shouldn't play an > > important role in helping VACUUM to differentiate between pages that > > should not be frozen and pages that should be frozen. > > I think we need to take it into acocunt to determine a useful freeze debt on a > table level (and potentially system wide too). If we reach the stage where XID age starts to matter, we're no longer talking about performance stability IMV. We're talking about avoiding wraparound, which is mostly a different problem. Recall that I started this whole line of discussion about debt in reaction to a point of Robert's about lazy freezing. My original point was that (as a general rule) we can afford to be lazy when we're not risking too much -- when the eventual cost of catching up (having gotten it wrong) isn't too painful (i.e. doesn't lead to a big balloon payment down the road). Laziness is the thing that requires justification. Putting off vital maintenance work for as long as possible only makes sense under fairly limited circumstances. A complicating factor here is the influence of the free space map. The way that that works (the total lack of hysteresis to discourage using space from old pages) is probably something that makes the problems with freezing harder to solve. Maybe that should be put in scope. > Assuming we could compute it cheaply enough, if we had an approximate median > oldest-64bit-xid-on-page and the number of unfrozen pages, we could > differentiate between tables that have lots of recent unfrozen pages (the > median will be low) and pages with lots of unfrozen pages that are unlikely to > be updated again (the median will be high and growing). Something like the > median 64bit xid would be interesting because it'd not get "invalidated" if > relfrozenxid is increased. I'm glad that you're mostly of the view that we should be freezing a lot more aggressively overall, but I think that you're still too focussed on avoiding small problems. I understand why novel new problems are generally more of a concern than established old problems, but there needs to be a sense of proportion. Performance stability is incredibly important, and isn't zero cost. -- Peter Geoghegan
pgsql-hackers by date: