Re: Eager page freeze criteria clarification - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Eager page freeze criteria clarification |
Date | |
Msg-id | CAH2-Wzme6k4f3U7dk7DU0OR9H3gsSRtDF-BfQijSF9tXWPup5w@mail.gmail.com Whole thread Raw |
In response to | Re: Eager page freeze criteria clarification (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Eager page freeze criteria clarification
|
List | pgsql-hackers |
On Mon, Sep 25, 2023 at 11:45 AM Robert Haas <robertmhaas@gmail.com> wrote: > > The reason I was thinking of using the "lsn at the end of the last vacuum", is > > that it seems to be more adapative to the frequency of vacuuming. > > Yes, but I think it's *too* adaptive. The frequency of vacuuming can > plausibly be multiple times per minute or not even annually. That's > too big a range of variation. +1. The risk of VACUUM chasing its own tail seems very real. We want VACUUM to be adaptive to the workload, not adaptive to itself. > Yeah, I don't know if that's exactly the right idea, but I think it's > in the direction that I was thinking about. I'd even be happy with > 100% of the time-between-recent checkpoints, maybe even 200% of > time-between-recent checkpoints. But I think there probably should be > some threshold beyond which we say "look, this doesn't look like it > gets touched that much, let's just freeze it so we don't have to come > back to it again later." The sole justification for any strategy that freezes lazily is that it can avoid useless freezing when freezing turns out to be unnecessary -- that's it. So I find it more natural to think of freezing as the default action, and *not freezing* as the thing that requires justification. Thinking about it "backwards" like that just seems simpler to me. There is only one possible reason to not freeze, but several reasons to freeze. > I think part of the calculus here should probably be that when the > freeze threshold is long, the potential gains from making it even > longer are not that much. If I change the freeze threshold on a table > from 1 minute to 1 hour, I can potentially save uselessly freezing > that page 59 times per hour, every hour, forever, if the page always > gets modified right after I touch it. If I change the freeze threshold > on a table from 1 hour to 1 day, I can only save 23 unnecessary > freezes per day. I totally agree with you on this point. It seems related to my point about "freezing being the conceptual default action" in VACUUM. Generally speaking, over-freezing is a problem when we reach the same wrong conclusion (freeze the page) about the same relatively few pages over and over -- senselessly repeating those mistakes really adds up when you're vacuuming the same table very frequently. On the other hand, under-freezing is typically a problem when we reach the same wrong conclusion (don't freeze the page) about lots of pages only once in a very long while. I strongly suspect that there is very little gray area between the two, across the full spectrum of application characteristics. Most individual pages have very little chance of being modified in the short to medium term. In a perfect world, with a perfect algorithm, we'd almost certainly be freezing most pages at the earliest opportunity. It is nevertheless also true that a freezing policy that is only somewhat more aggressive than this ideal oracle algorithm will freeze way too aggressive (by at least some measures). There isn't much of a paradox to resolve here: it's all down to the cadence of vacuuming, and of rows subject to constant churn. As you point out, the "same policy" can produce dramatically different outcomes when you actually consider what the consequences of the policy are over time, when applied by VACUUM under a variety of different workload conditions. So any freezing policy must be designed with due consideration for those sorts of things. If VACUUM doesn't freeze the page now, then when will it freeze it? For most individual pages, that time will come (again, pages that benefit from lazy vacuuming are the exception rather than the rule). Right now, VACUUM almost behaves as if it thought "that's not my problem, it's a problem for future me!". Trying to differentiate between pages that we must not over freeze and pages that we must not under freeze seems important. Generational garbage collection (as used by managed VM runtimes) does something that seems a little like this. It's based on the empirical observation that "most objects die young". What the precise definition of "young" really is varies significantly, but that turns out to be less of a problem than you might think -- it can be derived through feedback cycles. If you look at memory lifetimes on a logarithmic scale, very different sorts of applications tend to look like they have remarkably similar memory allocation characteristics. > Percentage-wise, the overhead of being wrong is the > same in both cases: I can have as many extra freeze operations as I > have page modifications, if I pick the worst possible times to freeze > in every case. But in absolute terms, the savings in the second > scenario are a lot less. Very true. I'm surprised that there hasn't been any discussion of the absolute amount of system-wide freeze debt on this thread. If 90% of the pages in the entire database are frozen, it'll generally be okay if we make the wrong call by freezing lazily when we shouldn't. This is doubly true within small to medium sized tables, where the cost of catching up on freezing cannot ever be too bad (concentrations of unfrozen pages in one big table are what really hurt users). -- Peter Geoghegan
pgsql-hackers by date: