Thread: What is "wraparound failure", really?
The wraparound failsafe mechanism added by commit 1e55e7d1 had minimal documentation -- just a basic description of how the GUCs work. I think that it certainly merits some discussion under "25.1. Routine Vacuuming" -- more specifically under "25.1.5. Preventing Transaction ID Wraparound Failures". One reason why this didn't happen in the original commit was that I just didn't know where to start with it. The docs in question have said this since 2006's commit 48188e16 first added autovacuum_freeze_max_age: "The sole disadvantage of increasing autovacuum_freeze_max_age (and vacuum_freeze_table_age along with it) is that the pg_xact and pg_commit_ts subdirectories of the database cluster will take more space..." This sentence seems completely unreasonable to me. It seems to just ignore the huge disadvantage of increasing autovacuum_freeze_max_age: the *risk* that the system will stop being able to allocate new XIDs because GetNewTransactionId() errors out with "database is not accepting commands to avoid wraparound data loss...". Sure, it's possible to take a lot of risk here without it ever blowing up in your face. And if it doesn't blow up then the downside really is zero. This is hardly a sensible way to talk about this important risk. Or any risk at all. At first I thought that the sentence was not just misguided -- it seemed downright bizarre. I thought that it was directly at odds with the title "Preventing Transaction ID Wraparound Failures". I thought that the whole point of this section was how not to have a wraparound failure (as I understand the term), and yet we seem to deliberately ignore the single most important practical aspect of making sure that that doesn't happen. But I now suspect that the basic definitions have been mixed up in a subtle but important way. What the documentation calls a "wraparound failure" seems to be rather different to what I thought that that meant. As I said, I thought that that meant the condition of being unable to get new transaction IDs (at least until the DBA runs VACUUM in single user mode). But the documentation in question seems to actually define it as "the condition of an old MVCC snapshot failing to see a version from the distant past, because somehow an XID wraparound suddenly makes it look as if it's in the distant future rather than in the past". It's actually talking about a subtly different thing, so the "sole disadvantage" sentence is not actually bizarre. It does still seem impractical and confusing, though. I strongly suspect that my interpretation of what "wraparound failure" means is actually the common one. Of course the system is never under any circumstances allowed to give totally wrong answers to queries, no matter what -- users should be able to take that much for granted. What users care about here is sensibly managing XIDs as a resource -- preventing "XID exhaustion" while being conservative, but not ridiculously conservative. Could the documentation be completely misleading users here? I have two questions: 1. Do I have this right? Is there really confusion about what a "wraparound failure" means, or is the confusion mine alone? 2. How do I go about integrating discussion of the failsafe here? Anybody have thoughts on that? -- Peter Geoghegan
On 6/27/21 4:36 PM, Peter Geoghegan wrote: > The wraparound failsafe mechanism added by commit 1e55e7d1 had minimal > documentation -- just a basic description of how the GUCs work. I > think that it certainly merits some discussion under "25.1. Routine > Vacuuming" -- more specifically under "25.1.5. Preventing Transaction > ID Wraparound Failures". One reason why this didn't happen in the > original commit was that I just didn't know where to start with it. > The docs in question have said this since 2006's commit 48188e16 first > added autovacuum_freeze_max_age: > > "The sole disadvantage of increasing autovacuum_freeze_max_age (and > vacuum_freeze_table_age along with it) is that the pg_xact and > pg_commit_ts subdirectories of the database cluster will take more > space..." > > This sentence seems completely unreasonable to me. It seems to just > ignore the huge disadvantage of increasing autovacuum_freeze_max_age: > the *risk* that the system will stop being able to allocate new XIDs > because GetNewTransactionId() errors out with "database is not > accepting commands to avoid wraparound data loss...". Sure, it's > possible to take a lot of risk here without it ever blowing up in your > face. And if it doesn't blow up then the downside really is zero. This > is hardly a sensible way to talk about this important risk. Or any > risk at all. > > At first I thought that the sentence was not just misguided -- it > seemed downright bizarre. I thought that it was directly at odds with > the title "Preventing Transaction ID Wraparound Failures". I thought > that the whole point of this section was how not to have a wraparound > failure (as I understand the term), and yet we seem to deliberately > ignore the single most important practical aspect of making sure that > that doesn't happen. But I now suspect that the basic definitions have > been mixed up in a subtle but important way. > > What the documentation calls a "wraparound failure" seems to be rather > different to what I thought that that meant. As I said, I thought that > that meant the condition of being unable to get new transaction IDs > (at least until the DBA runs VACUUM in single user mode). But the > documentation in question seems to actually define it as "the > condition of an old MVCC snapshot failing to see a version from the > distant past, because somehow an XID wraparound suddenly makes it look > as if it's in the distant future rather than in the past". It's > actually talking about a subtly different thing, so the "sole > disadvantage" sentence is not actually bizarre. It does still seem > impractical and confusing, though. > > I strongly suspect that my interpretation of what "wraparound failure" > means is actually the common one. Of course the system is never under > any circumstances allowed to give totally wrong answers to queries, no > matter what -- users should be able to take that much for granted. > What users care about here is sensibly managing XIDs as a resource -- > preventing "XID exhaustion" while being conservative, but not > ridiculously conservative. Could the documentation be completely > misleading users here? > > I have two questions: > > 1. Do I have this right? Is there really confusion about what a > "wraparound failure" means, or is the confusion mine alone? > > 2. How do I go about integrating discussion of the failsafe here? > Anybody have thoughts on that? > AIUI, actual wraparound (i.e. an xid crossing the event horizon so it appears to be in the future) is no longer possible. But it once was a very real danger. Maybe the docs haven't quite caught up. In practical terms, there is an awful lot of head room between the default for autovacuum_freeze_max_age and any danger of major anti-wraparound measures. Say you increase it to 1bn from the default 200m. That still leaves you ~1bn transactions of headroom. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Mon, Jun 28, 2021 at 8:36 AM Peter Geoghegan <pg@bowt.ie> wrote: > "The sole disadvantage of increasing autovacuum_freeze_max_age (and > vacuum_freeze_table_age along with it) is that the pg_xact and > pg_commit_ts subdirectories of the database cluster will take more > space..." Just by the way, if we're updating this sentence, it continues "because it must store..." but it should surely be "because they must store...".
On Sun, Jun 27, 2021 at 4:23 PM Andrew Dunstan <andrew@dunslane.net> wrote: > AIUI, actual wraparound (i.e. an xid crossing the event horizon so it > appears to be in the future) is no longer possible. But it once was a > very real danger. Maybe the docs haven't quite caught up. This was added a few years after freezing was first invented, which was arguably the last time that the design fundamentally changed. I think we all agree that it's fundamentally not okay to give wrong answers to queries -- it doesn't even need to be stated in the docs IMV. So why does this section of the docs spend so much time talking about something that fundamentally cannot happen? Why not have it focus on the bad outcome that there is a real risk of instead? Namely the risk of the system refusing to allow new XIDs (as a means of avoiding the wrong answers when all else fails). It's hard to talk about the new failsafe in this section of the docs now, since it's unclear whether it exists to advise the user on ways of avoiding the "can't allocate XIDs" failure mode. It could be interpreted that way, or it could just be explaining and/or justifying the existence of the failure mode. That seems like a real problem. > In practical terms, there is an awful lot of head room between the > default for autovacuum_freeze_max_age and any danger of major > anti-wraparound measures. Say you increase it to 1bn from the default > 200m. That still leaves you ~1bn transactions of headroom. I agree that in practice that's often fine. But my point is that there is another very good reason to not increase autovacuum_freeze_max_age, contrary to what the docs say (actually there is a far better reason than truncating clog). Namely, increasing it will generally increase the risk of VACUUM not finishing in time. If that happens the user gets the "can't allocate XIDs" failure mode (which is what I have called wraparound failure up until now), which is one of the worst things that can happen. This makes the inability to truncate clog look like a totally trivial issue in comparison. Reasonable people can disagree about when and how increasing autovacuum_freeze_max_age becomes truly reckless. However, I don't think that anybody would be willing to argue that setting it to the maximum of 2 billion could ever make sense in production, to go with the obvious extreme case. The benefits that you get from such a high setting over and above what you get with a moderately high setting (perhaps 1 - 1.5 billion) are really quite small, while the risk shoots up fast past a certain point. Regardless of what the nuances of increasing autovacuum_freeze_max_age are, stating that the sole disadvantage is that you cannot truncate clog and other SLRUs is clearly wrong. -- Peter Geoghegan
On Mon, Jun 28, 2021 at 5:36 AM Peter Geoghegan <pg@bowt.ie> wrote: > > The wraparound failsafe mechanism added by commit 1e55e7d1 had minimal > documentation -- just a basic description of how the GUCs work. I > think that it certainly merits some discussion under "25.1. Routine > Vacuuming" -- more specifically under "25.1.5. Preventing Transaction > ID Wraparound Failures". One reason why this didn't happen in the > original commit was that I just didn't know where to start with it. > The docs in question have said this since 2006's commit 48188e16 first > added autovacuum_freeze_max_age: > > "The sole disadvantage of increasing autovacuum_freeze_max_age (and > vacuum_freeze_table_age along with it) is that the pg_xact and > pg_commit_ts subdirectories of the database cluster will take more > space..." > > This sentence seems completely unreasonable to me. It seems to just > ignore the huge disadvantage of increasing autovacuum_freeze_max_age: > the *risk* that the system will stop being able to allocate new XIDs > because GetNewTransactionId() errors out with "database is not > accepting commands to avoid wraparound data loss...". Sure, it's > possible to take a lot of risk here without it ever blowing up in your > face. And if it doesn't blow up then the downside really is zero. This > is hardly a sensible way to talk about this important risk. Or any > risk at all. > > At first I thought that the sentence was not just misguided -- it > seemed downright bizarre. I thought that it was directly at odds with > the title "Preventing Transaction ID Wraparound Failures". I thought > that the whole point of this section was how not to have a wraparound > failure (as I understand the term), and yet we seem to deliberately > ignore the single most important practical aspect of making sure that > that doesn't happen. But I now suspect that the basic definitions have > been mixed up in a subtle but important way. > > What the documentation calls a "wraparound failure" seems to be rather > different to what I thought that that meant. As I said, I thought that > that meant the condition of being unable to get new transaction IDs > (at least until the DBA runs VACUUM in single user mode). But the > documentation in question seems to actually define it as "the > condition of an old MVCC snapshot failing to see a version from the > distant past, because somehow an XID wraparound suddenly makes it look > as if it's in the distant future rather than in the past". It's > actually talking about a subtly different thing, so the "sole > disadvantage" sentence is not actually bizarre. It does still seem > impractical and confusing, though. > > I strongly suspect that my interpretation of what "wraparound failure" > means is actually the common one. Of course the system is never under > any circumstances allowed to give totally wrong answers to queries, no > matter what -- users should be able to take that much for granted. > What users care about here is sensibly managing XIDs as a resource -- > preventing "XID exhaustion" while being conservative, but not > ridiculously conservative. Could the documentation be completely > misleading users here? > > I have two questions: > > 1. Do I have this right? Is there really confusion about what a > "wraparound failure" means, or is the confusion mine alone? > > 2. How do I go about integrating discussion of the failsafe here? > Anybody have thoughts on that? Looking through the doc again, it seems to me that there is no explicit explanation for the worst situation. It might be true in principle that “XID wraparound failure” means catastrophic data loss due to XID wraparound. But it doesn’t actually happen since we disallow to allocate new XID three million XID before the wraparound. In other words, entering the read-only mode is the worst situation in PostgreSQL in terms of XID consumption. There is some description of refusing to start any new transactions at the end of section 25.1.5 but it seems neither enough nor accurate. It describes the read-only mode from only the aspect of a safeguard but not from the aspect of the situation where we want to avoid. Explicitly describing also the latter aspect could give weight to both the description of failsafe mode, especially why we skip some operations to speed up increasing relfrozenxid in that mode, and another disadvantage of increasing autovacuum_freeze_max_age. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On 6/28/21 2:39 AM, Peter Geoghegan wrote: > On Sun, Jun 27, 2021 at 4:23 PM Andrew Dunstan <andrew@dunslane.net> wrote: > >> In practical terms, there is an awful lot of head room between the >> default for autovacuum_freeze_max_age and any danger of major >> anti-wraparound measures. Say you increase it to 1bn from the default >> 200m. That still leaves you ~1bn transactions of headroom. > I agree that in practice that's often fine. But my point is that there > is another very good reason to not increase autovacuum_freeze_max_age, > contrary to what the docs say (actually there is a far better reason > than truncating clog). Namely, increasing it will generally increase > the risk of VACUUM not finishing in time. If that happens the user > gets the "can't allocate XIDs" failure mode (which is what I have > called wraparound failure up until now), which is one of the worst > things that can happen. This makes the inability to truncate clog look > like a totally trivial issue in comparison. > > Reasonable people can disagree about when and how increasing > autovacuum_freeze_max_age becomes truly reckless. However, I don't > think that anybody would be willing to argue that setting it to the > maximum of 2 billion could ever make sense in production, to go with > the obvious extreme case. The benefits that you get from such a high > setting over and above what you get with a moderately high setting > (perhaps 1 - 1.5 billion) are really quite small, while the risk > shoots up fast past a certain point. > > Regardless of what the nuances of increasing autovacuum_freeze_max_age > are, stating that the sole disadvantage is that you cannot truncate > clog and other SLRUs is clearly wrong. > Sure, I'm not suggesting the docs can't have some improvement. This is one of those things that in my experience most people don't get. Indeed, I didn't really get it either until I had to explain it with some clarity to a very confused customer. And I find it's best explained by showing what bad results are being avoided by it. Freezing is one of those almost useless things you just have to do. It doesn't help that it's tangled up with VACUUM, so when you explain that it's not about reclaiming dead space heads start to explode. But if you're really worried about people setting autovacuum_freeze_max_age too high, then maybe we should be talking about capping it at a lower level rather than adjusting the docs that most users don't read. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Mon, Jun 28, 2021 at 08:51:50AM -0400, Andrew Dunstan wrote: > On 6/28/21 2:39 AM, Peter Geoghegan wrote: > > I agree that in practice that's often fine. But my point is that there > > is another very good reason to not increase autovacuum_freeze_max_age, > > contrary to what the docs say (actually there is a far better reason > > than truncating clog). Namely, increasing it will generally increase > > the risk of VACUUM not finishing in time. Yep, that doc section's priorities are out of date. > But if you're really worried about people setting > autovacuum_freeze_max_age too high, then maybe we should be talking > about capping it at a lower level rather than adjusting the docs that > most users don't read. If a GUC minimum or maximum feels like a mainstream choice, it's probably too strict. Hence, I think the current maximum is fine. At 93% of the XID space, it's not risk-averse, but it's not absurd.
On Mon, Jun 28, 2021 at 8:52 AM Andrew Dunstan <andrew@dunslane.net> wrote: > But if you're really worried about people setting > autovacuum_freeze_max_age too high, then maybe we should be talking > about capping it at a lower level rather than adjusting the docs that > most users don't read. The problem is that the setting is measuring something that is a pretty poor proxy for the thing we actually care about. It's measuring the XID age at which we're going to start forcing vacuums on tables that don't otherwise need to be vacuumed, but the thing we care about is the XID age at which those vacuums are going to *finish*. Now maybe you think that's a minor difference, and if your tables are small, it is, but if they're really big, it's not. If you have only tables that are say 1GB in size and your system is otherwise well-configured, you could probably crank autovacuum_freeze_max_age up all the way to the max without a problem. But if you have 1TB tables, you are going to need a lot more headroom. The exact amount of headroom you need depends especially on the size of your largest tables, but also on how well-distributed the relfrozenxid values are, and on the total sizes of all your tables, on your I/O subsystem, on your XID consumption rate, on your vacuum delay settings, and on whether you want to make any allowance for the rare but possible scenario where vacuum dies to an ERROR. This means that in practice nobody knows whether a particular setting of autovacuum_freeze_max_age on a particular system is safe or not, except in the absolutely most obvious cases. Capping it at a lower level would prevent some people from doing things that are perfectly safe and still not prevent other people from doing things that are horribly dangerous. I think what we really need here is some kind of deadline-based scheduler. As Peter says, the problem is that we might run out of XIDs. The system should be constantly thinking about that and taking appropriate emergency actions to make sure it doesn't happen. Right now it's really pretty chill about the possibility of looming disaster. Imagine that you hire a babysitter and tell them to get the kids out of the house if there's a fire. While you're out, a volcano erupts down the block. A giant cloud of ash forms and there's lava everywhere, even touching the house, which begins to smolder, but the babysitter just sits there and watches TV. As soon as the first flames appear, the babysitter stops watching TV, gets the kids, and tries to leave the premises. That's our autovacuum scheduler! It has no inclination or ability to see the future; it makes decisions entirely based on the present state of things. In a lot of cases that's OK, but sometimes it leads to a completely ridiculous outcome. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Jun 30, 2021 at 6:46 AM Robert Haas <robertmhaas@gmail.com> wrote: > The problem is that the setting is measuring something that is a > pretty poor proxy for the thing we actually care about. It's measuring > the XID age at which we're going to start forcing vacuums on tables > that don't otherwise need to be vacuumed, but the thing we care about > is the XID age at which those vacuums are going to *finish*. Now maybe > you think that's a minor difference, and if your tables are small, it > is, but if they're really big, it's not. If you have only tables that > are say 1GB in size and your system is otherwise well-configured, you > could probably crank autovacuum_freeze_max_age up all the way to the > max without a problem. But if you have 1TB tables, you are going to > need a lot more headroom. I 100% agree with all of that. However, I can't help but notice that your argument seems to work best as an argument against how freezing works in general. The scheduling is way too complex because we're fundamentally trying to model something that is way too complex and nonlinear by its very nature. It's true that we can do a better job by continually updating our understanding of the state of the system dynamically, during each VACUUM. But maybe we should get rid of freezing instead. Is it really so hard to do that, in the grand scheme of things? We have tuple freezing because we need it to solve a problem with the "physical database" (not the "logical database"). Namely the problem of having 32-bit XIDs in tuple headers when 64-bit XIDs are theoretically what we need. I'm not actually in favor of 64-bit XIDs in tuple headers (or anything like it), but I am in favor of at least solving the problem with a true "physical database" level solution. The definition of freezing unnecessarily couples how we handle the XID issue with GC by VACUUM, which makes everything much more fragile. A frozen tuple must necessarily be visible to any possible MVCC snapshot. That's really fragile, in many different ways. It's also unnecessary. Why should XID wraparound be a problem for the entire system? Why not just make it a problem for any very old MVCC snapshots that are *actually* about to be affected? Some kind of "snapshot too old" approach seems quite possible. I think that we can do a lot better than freezing within the confines of the current heapam design (or the design prior to the introduction of freezing ~20 years ago). Once aborted XIDs are removed eagerly, a strict "logical vs physical" separation of concerns can be imposed. I'm sorry to go on about this again and again, but it really does seem related to what you're saying. The current freezing design is hard to model because it's inherently fragile. > I think what we really need here is some kind of deadline-based > scheduler. As Peter says, the problem is that we might run out of > XIDs. The system should be constantly thinking about that and taking > appropriate emergency actions to make sure it doesn't happen. Right > now it's really pretty chill about the possibility of looming > disaster. Imagine that you hire a babysitter and tell them to get the > kids out of the house if there's a fire. While you're out, a volcano > erupts down the block. A giant cloud of ash forms and there's lava > everywhere, even touching the house, which begins to smolder, but the > babysitter just sits there and watches TV. As soon as the first flames > appear, the babysitter stops watching TV, gets the kids, and tries to > leave the premises. That's our autovacuum scheduler! It has no > inclination or ability to see the future; it makes decisions entirely > based on the present state of things. In a lot of cases that's OK, but > sometimes it leads to a completely ridiculous outcome. Yeah, it's still pretty absurd, even with the failsafe. To extend your analogy, in the real world the babysitter can afford to make very conservative assumptions about whether or not the house is about to catch fire. In practice the chances of that happening on any given day are certainly very low -- it'll probably never come close to happening even once. And there is an inherent asymmetry, since of course the cost of a false positive is that the friends reunion episode is unnecessarily cut short, which is totally inconsequential compared to the cost of a false negative. If there wasn't such a big asymmetry then what we'd probably do is not even think about what the babysitter does -- we just wouldn't care at all. Anyway, I'll try to come up with a way of rewording this section of the docs that mostly preserves its existing structure, but makes it possible to talk about the failsafe. The current structure of this section of the docs is needlessly ambiguous, but I think that that can be fixed without changing too much. FWIW I have heard things that suggest that some users believe that modern PostgreSQL can actually allow "the past to look like the future" in some cases -- probably because of the wording here. This area of the system certainly is scary, but it's not quite that scary. -- Peter Geoghegan