Re: What is "wraparound failure", really? - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: What is "wraparound failure", really? |
Date | |
Msg-id | CAH2-Wz=seunV_jxRbA-eF1bvW5C2p=Ai+3O6juccOLZ7hvLKow@mail.gmail.com Whole thread Raw |
In response to | Re: What is "wraparound failure", really? (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Wed, Jun 30, 2021 at 6:46 AM Robert Haas <robertmhaas@gmail.com> wrote: > The problem is that the setting is measuring something that is a > pretty poor proxy for the thing we actually care about. It's measuring > the XID age at which we're going to start forcing vacuums on tables > that don't otherwise need to be vacuumed, but the thing we care about > is the XID age at which those vacuums are going to *finish*. Now maybe > you think that's a minor difference, and if your tables are small, it > is, but if they're really big, it's not. If you have only tables that > are say 1GB in size and your system is otherwise well-configured, you > could probably crank autovacuum_freeze_max_age up all the way to the > max without a problem. But if you have 1TB tables, you are going to > need a lot more headroom. I 100% agree with all of that. However, I can't help but notice that your argument seems to work best as an argument against how freezing works in general. The scheduling is way too complex because we're fundamentally trying to model something that is way too complex and nonlinear by its very nature. It's true that we can do a better job by continually updating our understanding of the state of the system dynamically, during each VACUUM. But maybe we should get rid of freezing instead. Is it really so hard to do that, in the grand scheme of things? We have tuple freezing because we need it to solve a problem with the "physical database" (not the "logical database"). Namely the problem of having 32-bit XIDs in tuple headers when 64-bit XIDs are theoretically what we need. I'm not actually in favor of 64-bit XIDs in tuple headers (or anything like it), but I am in favor of at least solving the problem with a true "physical database" level solution. The definition of freezing unnecessarily couples how we handle the XID issue with GC by VACUUM, which makes everything much more fragile. A frozen tuple must necessarily be visible to any possible MVCC snapshot. That's really fragile, in many different ways. It's also unnecessary. Why should XID wraparound be a problem for the entire system? Why not just make it a problem for any very old MVCC snapshots that are *actually* about to be affected? Some kind of "snapshot too old" approach seems quite possible. I think that we can do a lot better than freezing within the confines of the current heapam design (or the design prior to the introduction of freezing ~20 years ago). Once aborted XIDs are removed eagerly, a strict "logical vs physical" separation of concerns can be imposed. I'm sorry to go on about this again and again, but it really does seem related to what you're saying. The current freezing design is hard to model because it's inherently fragile. > I think what we really need here is some kind of deadline-based > scheduler. As Peter says, the problem is that we might run out of > XIDs. The system should be constantly thinking about that and taking > appropriate emergency actions to make sure it doesn't happen. Right > now it's really pretty chill about the possibility of looming > disaster. Imagine that you hire a babysitter and tell them to get the > kids out of the house if there's a fire. While you're out, a volcano > erupts down the block. A giant cloud of ash forms and there's lava > everywhere, even touching the house, which begins to smolder, but the > babysitter just sits there and watches TV. As soon as the first flames > appear, the babysitter stops watching TV, gets the kids, and tries to > leave the premises. That's our autovacuum scheduler! It has no > inclination or ability to see the future; it makes decisions entirely > based on the present state of things. In a lot of cases that's OK, but > sometimes it leads to a completely ridiculous outcome. Yeah, it's still pretty absurd, even with the failsafe. To extend your analogy, in the real world the babysitter can afford to make very conservative assumptions about whether or not the house is about to catch fire. In practice the chances of that happening on any given day are certainly very low -- it'll probably never come close to happening even once. And there is an inherent asymmetry, since of course the cost of a false positive is that the friends reunion episode is unnecessarily cut short, which is totally inconsequential compared to the cost of a false negative. If there wasn't such a big asymmetry then what we'd probably do is not even think about what the babysitter does -- we just wouldn't care at all. Anyway, I'll try to come up with a way of rewording this section of the docs that mostly preserves its existing structure, but makes it possible to talk about the failsafe. The current structure of this section of the docs is needlessly ambiguous, but I think that that can be fixed without changing too much. FWIW I have heard things that suggest that some users believe that modern PostgreSQL can actually allow "the past to look like the future" in some cases -- probably because of the wording here. This area of the system certainly is scary, but it's not quite that scary. -- Peter Geoghegan
pgsql-hackers by date: