Re: Eager page freeze criteria clarification - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Eager page freeze criteria clarification |
Date | |
Msg-id | 20230927174633.hrnoia3vz5s7a5uv@alap3.anarazel.de Whole thread Raw |
In response to | Re: Eager page freeze criteria clarification (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Eager page freeze criteria clarification
|
List | pgsql-hackers |
Hi, On 2023-09-27 10:25:00 -0700, Peter Geoghegan wrote: > On Wed, Sep 27, 2023 at 10:01 AM Andres Freund <andres@anarazel.de> wrote: > > On 2023-09-26 09:07:13 -0700, Peter Geoghegan wrote: > > I don't think doing this on a system wide basis with a metric like #unfrozen > > pages is a good idea. It's quite common to have short lived data in some > > tables while also having long-lived data in other tables. Making opportunistic > > freezing more aggressive in that situation will just hurt, without a benefit > > (potentially even slowing down the freezing of older data!). And even within a > > single table, making freezing more aggressive because there's a decent sized > > part of the table that is updated regularly and thus not frozen, doesn't make > > sense. > > I never said that #unfrozen pages should be the sole criterion, for > anything. Just that it would influence the overall strategy, making > the system veer towards more aggressive freezing. It would complement > a more sophisticated algorithm that decides whether or not to freeze a > page based on its individual characteristics. > > For example, maybe the page-level algorithm would have a random > component. That could potentially be where the global (or at least > table level) view gets to influence things -- the random aspect is > weighed using the global view of debt. That kind of thing seems like > an interesting avenue of investigation. I don't disagree that we should do something in that direction - I just don't see the raw number of unfrozen pages being useful in that regard. If you have a database where no pages live long, we don't need to freeze oppportunistically, yet the fraction of unfrozen pages will be huge. > > If we want to take global freeze debt into account, which I think is a good > > idea, we'll need a smarter way to represent the debt than just the number of > > unfrozen pages. I think we would need to track the age of unfrozen pages in > > some way. If there are a lot of unfrozen pages with a recent xid, then it's > > fine, but if they are older and getting older, it's a problem and we need to > > be more aggressive. > > Tables like pgbench_history will have lots of unfrozen pages with a > recent XID that get scanned during every VACUUM. We should be freezing > such pages at the earliest opportunity. I think we ought to be able to freeze tables with as simple a workload as pgbench_history has aggressively without taking a global freeze debt into account. > > The problem I see is how track the age of unfrozen data - > > it'd be easy enough to track the mean(oldest-64bit-xid-on-page), but then we > > again have the issue of rare outliers moving the mean too much... > > I think that XID age is mostly not very important compared to the > absolute amount of unfrozen pages, and the cost profile of freezing > now versus later. (XID age *is* important in emergencies, but that's > mostly not what we're discussing right now.) We definitely *also* should take the number of unfrozen pages into account. I just don't determining freeze debt primarily using the number of unfrozen pages will be useful. The presence of unfrozen pages that are likely to be updated again soon is not a problem and makes the simple metric pretty much useless. > To be clear, that doesn't mean that XID age shouldn't play an > important role in helping VACUUM to differentiate between pages that > should not be frozen and pages that should be frozen. I think we need to take it into acocunt to determine a useful freeze debt on a table level (and potentially system wide too). Assuming we could compute it cheaply enough, if we had an approximate median oldest-64bit-xid-on-page and the number of unfrozen pages, we could differentiate between tables that have lots of recent unfrozen pages (the median will be low) and pages with lots of unfrozen pages that are unlikely to be updated again (the median will be high and growing). Something like the median 64bit xid would be interesting because it'd not get "invalidated" if relfrozenxid is increased. Greetings, Andres Freund
pgsql-hackers by date: