Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: New strategies for freezing, advancing relfrozenxid early |
Date | |
Msg-id | CAH2-WzkvBdD13T+O1NFv97Pt-_kL3NxiAA9JZ5-nAFZS2xuZSA@mail.gmail.com Whole thread Raw |
In response to | Re: New strategies for freezing, advancing relfrozenxid early (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: New strategies for freezing, advancing relfrozenxid early
|
List | pgsql-hackers |
On Mon, Dec 12, 2022 at 3:47 PM Jeff Davis <pgsql@j-davis.com> wrote: > Freezing is driven by a need to keep the age of the oldest > transaction ID in a table to less than ~2B; and also the need to > truncate the clog (and reduce lookups of really old xids). It's fine to > give a brief explanation about why we can't track very old xids, but > it's more of an internal detail and not the main point. I agree that that's the conventional definition. What I am proposing is that we revise that definition a little. We should start the discussion of freezing in the user level docs by pointing out that freezing also plays a role at the level of individual pages. An all-frozen page is self-contained, now and forever (or until it gets dirtied again, at least). Even on a standby we will reliably avoid having to do clog lookups for a page that happens to have all of its tuples frozen. I don't want to push back too much here. I just don't think that it makes terribly much sense for the docs to start the conversation about freezing by talking about the worst consequences of not freezing for an extended period of time. That's relevant, and it's probably going to end up as the aspect of freezing that we spend most time on, but it still doesn't seem like a useful starting point to me. To me this seems related to the fallacy that relfrozenxid age is any kind of indicator about how far behind we are on freezing. I think that there is value in talking about freezing as a maintenance task for physical heap pages, and only then talking about relfrozenxid and the circular XID space. The 64-bit XID patch doesn't get rid of freezing at all, because it is still needed to break the dependency of tuples stored in heap pages on the pg_xact, and other SLRUs -- which suggests that you can talk about freezing and advancing relfrozenxid as different (though still closely related) concepts. > * I'm still having a hard time with vacuum_freeze_strategy_threshold. > Part of it is the name, which doesn't seem to convey the meaning. I chose the name long ago, and never gave it terribly much thought. I'm happy to go with whatever name you prefer. > But the heuristic also seems off to me. What if you have lots of partitions > in an append-only range-partitioned table? That would tend to use the > lazy freezing strategy (because each partition is small), but that's > not what you want. I understand heuristics aren't perfect, but it feels > like we could do something better. It is at least vastly superior to vacuum_freeze_min_age in cases like this. Not that that's hard -- vacuum_freeze_min_age just doesn't ever trigger freezing in any autovacuum given a table like pgbench_history (barring during aggressive mode), due to how it interacts with the visibility map. So we're practically guaranteed to do literally all freezing for an append-only table in an aggressive mode VACUUM. Worst of all, that happens on a timeline that has nothing to do with the physical characteristics of the table itself (like the number of unfrozen heap pages or something). In fact, it doesn't even have anything to do with how many distinct XIDs modified that particular table -- XID age works at the system level. By working at the heap rel level (which means the partition level if it's a partitioned table), and by being based on physical units (table size), vacuum_freeze_strategy_threshold at least manages to limit the accumulation of unfrozen heap pages in each individual relation. This is the fundamental unit at which VACUUM operates. So even if you get very unlucky and accumulate many unfrozen heap pages that happen to be distributed across many different tables, you can at least vacuum each table independently, and in parallel. The really big problems all seem to involve concentration of unfrozen tables in one particular table (usually the events table, the largest table in the system by a couple of orders of magnitude). That said, I agree that the system-level picture of debt (the system level view of the number of unfrozen heap pages) is relevant, and that it isn't directly considered by the patch. I think that that can be treated as work for a future release. In fact, I think that there is a great deal that we could teach autovacuum.c about the system level view of things -- this is only one. > Also, another purpose of this seems > to be to achieve v15 behavior (if v16 behavior causes a problem for > some workload), which seems like a good idea, but perhaps we should > have a more direct setting for that? Why, though? I think that it happens to make sense to do both with one setting. Not because it's better to have 2 settings than 1 (though it is) -- just because it makes sense here, given these specifics. > * The comment above lazy_scan_strategy() is phrased in terms of the > "traditional approach". It would be more clear if you described the > current strategies and how they're chosen. The pre-16 behavior was as > lazy as possible, so that's easy enough to describe without referring > to history. Agreed. Will fix. > * "eager skipping behavior" seems like a weird phrasing because it's > not immediately clear if that means "skip more pages" (eager to skip > pages and lazy to process them) or "skip fewer pages" (lazy to skip the > pages and eager to process the pages). I agree that that's a problem. I'll try to come up with a terminology that doesn't have this problem ahead of the next version. > * The skipping behavior is for all-visible pages is binary: skip them > all, or skip none. That makes sense in the context of relfrozenxid > advancement. But how does that avoid IO spikes? It would seem perfectly > reasonable to me, if relfrozenxid advancement is not a pressing > problem, to process some fraction of the all-visible pages (or perhaps > process enough of them to freeze some fraction). That's something that v9 will do, unlike earlier versions. So I agree. In particular, we'll now start freezing eagerly before we switch over to preferring to advance relfrozenxid for the same table. As I said in my summary of v9 the other day, we "stagger" the point at which these two behaviors are first applied, with the goal of smoothing the transition. We try to disguise the fact that there are still two different sets of behavior. We try to get the best of both worlds (eager and lazy behaviors), without the user ever really noticing. Don't forget that eager behavior with the visibility map is expected to directly lead to freezing more pages (not a guarantee, but quite likely). So while skipping strategy and freezing strategy are two independent things, they're independent in name only, mechanically. They are not independent things in any practical sense. (The underlying reason why that is true is of course the same reason why vacuum_freeze_min_age only really works as designed in aggressive mode VACUUMs.) > each VACUUM makes a payment on the deferred costs of freezing. I think > this has already been discussed but it keeps reappearing in my mind, so > maybe we can settle this with a comment (and/or docs)? That said, I believe that we should always advance relfrozenxid in tables that are already moderately sized -- a table that is already big enough to be some small multiple of vacuum_freeze_strategy_threshold should always take an eager approach to advancing relfrozenxid. That is, I don't think that it makes sense to pay the cost of freezing down incrementally given a moderately large table. Large tables and small tables are qualitatively different things, at least from a VACUUM point of view. To some degree we can afford to be wrong about small tables, because that won't cause us any serious pain. This isn't really true with larger tables -- a VACUUM of a large table is "too big to fail". Our working assumption for tables that are still growing now, in the ongoing VACUUM, is that they will continue to grow. There is often one very large table, and by the time the next VACUUM comes around, the table may have accumulated more unfrozen pages than the entire rest of the database combined (I mean all of the rest of the database, frozen and unfrozen pages alike). This may even be common: https://brandur.org/fragments/events > * I'm wondering whether vacuum_freeze_min_age makes sense anymore. It > doesn't take effect unless the page is not skipped, which is confusing > from a usability standpoint, and we have better heuristics to decide if > the whole page should be frozen or not anyway (i.e. if an FPI was > already taken then freezing is cheaper). I think that vacuum_freeze_min_age still has a role to play. The only thing that can trigger freezing during a VACUUM that opts to use a lazy strategy VACUUM is the FPI-from-pruning trigger mechanism (new to v9), plus vacuum_freeze_min_age/FreezeLimit. So you cannot really have a lazy strategy without vacuum_freeze_min_age. The original vacuum_freeze_min_age design did make sense, at least pre-visibility-map, because sometimes being lazy about freezing is the best strategy. Especially with small, frequently updated tables like most of the pgbench tables. There is nothing inherently wrong with deciding to freeze (or even to wait for a cleanup lock) on the basis of a given XID's age. My problem isn't with that behavior in general. It's with the fact that we use it even when it's clearly inappropriate -- wildly inappropriate. We have plenty of information that strongly hints at whether or not laziness is a good idea. It's a good idea whenever laziness has a decent chance of avoiding completely unnecessary work altogether, provided we can afford to be wrong about that without having to pay too high a cost later on, when we have to course correct. What this mostly boils down to is this: lazy freezing is generally a good idea in small tables only. -- Peter Geoghegan
pgsql-hackers by date: