Thread: improving wraparound behavior
I spent a significant chunk of today burning through roughly 2^31 XIDs just to see what would happen. My test setup consisted of autovacuum=off plus a trivial prepared transaction plus a lot of this: + BeginInternalSubTransaction("txid_burn"); + (void) GetCurrentTransactionId(); + ReleaseCurrentSubTransaction(); Observations: 1. As soon as the XID of the prepared transaction gets old enough to trigger autovacuum, autovacuum goes nuts. It vacuums everything in the database over and over again, but that does no good, because the prepared transaction holds back the XID horizon. There are previous reports of this and related problems, such as this one from 2014: http://postgr.es/m/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@mail.gmail.com That thread got hung up on the question of prioritization: if there's a lot of stuff that needs to be autovacuumed, which stuff should we do first? But I think that we overlooked a related issue, which is that there's no point in autovacuuming a table in the first place if doing so won't advance help advance relfrozenxid and/or relminmxid. The autovacuum launcher will happily compute a force limit that is newer than OldestXmin and decide on that basis to route a worker to a particular database, and that worker will then compute a force limit that is newer than OldestXmin examine relations in that database and decide to vacuum them, and then the vacuum operation itself will decide on a similar basis that it's going to be aggressive vacuum. But we can't actually remove any tuples that are newer than OldestXmin, so we have no actual hope of accomplishing anything by that aggressive vacuum. I am not sure exactly how to fix this, because the calculation we use to determine the XID that can be used to vacuum a specific table is pretty complex; how can the postmaster know whether it's going to be able to make any progress in *any* table in some database to which it's not even connected? But it's surely crazy to just keep doing something over and over that can't possibly work. 2. Once you get to the point where you start to emit errors when attempting to assign an XID, you can still run plain old VACUUM because it doesn't consume an XID ... except that if it tries to truncate the relation, then it will take AccessExclusiveLock, which has to be logged, which forces an XID assignment, which makes VACUUM fail. So if you burn through XIDs until the system gets to this point, and then you roll back the prepared transaction that caused the problem in the first place, autovacuum sits there trying to vacuum tables in a tight loop and fails over and over again as soon as hits a table that it thinks needs to be truncated. This seems really lame, and also easily fixed. Attached is a patch that disables vacuum truncation if xidWarnLimit has been reached. With this patch, in my testing, autovacuum is able to recover the system once the prepared transaction has been rolled back. Without this patch, not only does that not happen, but if you had a database with enough relations that need truncation, you could conceivably cause XID wraparound just from running a database-wide VACUUM, the one tool you have available to avoid XID wraparound. I think that this amounts to a back-patchable bug fix. (One could argue that truncation should be disabled sooner than this, like when we've exceed autovacuum_freeze_max_age, or later than this, like when we hit xidStopLimit, but I think xidWarnLimit is probably the best compromise.) 3. The message you get when you hit xidStopLimit seems like bad advice to me: ERROR: database is not accepting commands to avoid wraparound data loss in database "%s" HINT: Stop the postmaster and vacuum that database in single-user mode. You might also need to commit or roll back old prepared transactions, or drop stale replication slots. Why do we want people to stop the postmaster and vacuum that database in single user mode? Why not just run VACUUM in multi-user mode, or let autovacuum take care of the problem? Granted, if VACUUM is going to fail in multi-user mode, and if switching to single-user mode is going to make it succeed, then it's a good suggestion. But it seems that it doesn't fail in multi-user mode, unless it tries to truncate something, which is a bug we should fix. Telling people to go to single-user mode where they can continue to assign XIDs even though they have almost no XIDs left seems extremely dangerous, actually. Also, I think that old prepared transactions and stale replication slots should be emphasized more prominently. Maybe something like: HINT: Commit or roll back old prepared transactions, drop stale replication slots, or kill long-running sessions. Ensure that autovacuum is progressing, or run a manual database-wide VACUUM. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hi, On 2019-05-03 16:26:46 -0400, Robert Haas wrote: > 2. Once you get to the point where you start to emit errors when > attempting to assign an XID, you can still run plain old VACUUM > because it doesn't consume an XID ... except that if it tries to > truncate the relation, then it will take AccessExclusiveLock, which > has to be logged, which forces an XID assignment, which makes VACUUM > fail. So if you burn through XIDs until the system gets to this > point, and then you roll back the prepared transaction that caused the > problem in the first place, autovacuum sits there trying to vacuum > tables in a tight loop and fails over and over again as soon as hits a > table that it thinks needs to be truncated. This seems really lame, > and also easily fixed. > > Attached is a patch that disables vacuum truncation if xidWarnLimit > has been reached. With this patch, in my testing, autovacuum is able > to recover the system once the prepared transaction has been rolled > back. Without this patch, not only does that not happen, but if you > had a database with enough relations that need truncation, you could > conceivably cause XID wraparound just from running a database-wide > VACUUM, the one tool you have available to avoid XID wraparound. I > think that this amounts to a back-patchable bug fix. > > (One could argue that truncation should be disabled sooner than this, > like when we've exceed autovacuum_freeze_max_age, or later than this, > like when we hit xidStopLimit, but I think xidWarnLimit is probably > the best compromise.) I'd actually say the proper fix would be to instead move the truncation to *after* finishing updating relfrozenxid etc. If we truncate, the additional cost of another in-place pg_class update, to update relpages, is basically insignificant. And the risk of errors, or being cancelled, during truncation is much higher than before (due to the AEL). > Also, I think that old prepared transactions and stale replication > slots should be emphasized more prominently. Maybe something like: > > HINT: Commit or roll back old prepared transactions, drop stale > replication slots, or kill long-running sessions. > Ensure that autovacuum is progressing, or run a manual database-wide VACUUM. I think it'd be good to instead compute what the actual problem is. It'd not be particularly hard to show some these in the errdetail: 1) the xid horizon (xid + age) of the problematic database; potentially, if connected to that database, additionally compute what the oldest xid is (although that's computationally potentially too expensive) 2) the xid horizon (xid + age) due to prepared transactions, and the oldest transaction's name 3) the xid horizon (xid + age) due to replication slot, and the "oldest" slot's name 4) the xid horizon (xid + age) and pid for the connection with the oldest snapshot. I think that'd allow users much much easier to pinpoint what's going on. In fact, I think we probably should additionally add a function that can display the above. That'd make it much easier to write monitoring queries. IMO we also ought to compute the *actual* relfrozenxid/relminmxid for a table. I.e. the oldest xid actually present. It's pretty common for most tables to have effective horizons that are much newer than what GetOldestXmin()/vacuum_set_xid_limits() can return. Obviously we can do so only when scanning all non-frozen pages. But being able to record "more aggressive" horizons would often save unnecessary work. And it ought to not be hard. I think especially for regular non-freeze, non-wraparound vacuums that'll often result in a much newer relfrozenxid (as we'll otherwise just have GetOldestXmin() - vacuum_freeze_min_age). Greetings, Andres Freund
On Fri, May 3, 2019 at 4:47 PM Andres Freund <andres@anarazel.de> wrote: > I'd actually say the proper fix would be to instead move the truncation > to *after* finishing updating relfrozenxid etc. If we truncate, the > additional cost of another in-place pg_class update, to update relpages, > is basically insignificant. And the risk of errors, or being cancelled, > during truncation is much higher than before (due to the AEL). That would prevent the ERROR from impeding relfrozenxid advancement, but it does not prevent the error itself, nor the XID consumption. If autovacuum is hitting that ERROR, it will spew errors in the log but succeed in advancing relfrozenxid anyway. I don't think that's as nice as the behavior I proposed, but it's certainly better than the status quo. If you are hitting that error due to a manual VACUUM, under your proposal, you'll stop the manual VACUUM as soon as you hit the first table where this happens, which is not what you want. You'll also keep consuming XIDs, which is not what you want either, especially if you are in single-user mode because the number of remaining XIDs is less that a million. > > Also, I think that old prepared transactions and stale replication > > slots should be emphasized more prominently. Maybe something like: > > > > HINT: Commit or roll back old prepared transactions, drop stale > > replication slots, or kill long-running sessions. > > Ensure that autovacuum is progressing, or run a manual database-wide VACUUM. > > I think it'd be good to instead compute what the actual problem is. It'd > not be particularly hard to show some these in the errdetail: > > 1) the xid horizon (xid + age) of the problematic database; potentially, > if connected to that database, additionally compute what the oldest > xid is (although that's computationally potentially too expensive) > 2) the xid horizon (xid + age) due to prepared transactions, and the > oldest transaction's name > 3) the xid horizon (xid + age) due to replication slot, and the "oldest" > slot's name > 4) the xid horizon (xid + age) and pid for the connection with the > oldest snapshot. > > I think that'd allow users much much easier to pinpoint what's going on. > > In fact, I think we probably should additionally add a function that can > display the above. That'd make it much easier to write monitoring > queries. I think that the error and hint that you get from GetNewTransactionId() has to be something that we can generate very quickly, without doing anything that might hang on cluster with lots of databases or lots of relations; but if there's useful detail we can display there, that's good. With a view, it's more OK if it takes a long time on a big cluster. > IMO we also ought to compute the *actual* relfrozenxid/relminmxid for a > table. I.e. the oldest xid actually present. It's pretty common for most > tables to have effective horizons that are much newer than what > GetOldestXmin()/vacuum_set_xid_limits() can return. Obviously we can do > so only when scanning all non-frozen pages. But being able to record > "more aggressive" horizons would often save unnecessary work. And it > ought to not be hard. I think especially for regular non-freeze, > non-wraparound vacuums that'll often result in a much newer relfrozenxid > (as we'll otherwise just have GetOldestXmin() - vacuum_freeze_min_age). Sure, that would make sense. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I spent a significant chunk of today burning through roughly 2^31 XIDs > just to see what would happen. ... > 2. Once you get to the point where you start to emit errors when > attempting to assign an XID, you can still run plain old VACUUM > because it doesn't consume an XID ... except that if it tries to > truncate the relation, then it will take AccessExclusiveLock, which > has to be logged, which forces an XID assignment, which makes VACUUM > fail. Yeah. I tripped over that earlier this week in connection with the REINDEX business: taking an AEL only forces XID assignment when wal_level is above "minimal", so it's easy to come to the wrong conclusions depending on your test environment. I suspect that previous testing of wraparound behavior (yes there has been some) didn't see this effect because the default wal_level didn't use to cause it to happen. But anyway, it's there now and I agree we'd better do something about it. My brain is too fried from release-note-writing to have any trustworthy opinion right now about whether your patch is the best way. regards, tom lane
Hi, On 2019-05-03 18:42:35 -0400, Robert Haas wrote: > On Fri, May 3, 2019 at 4:47 PM Andres Freund <andres@anarazel.de> wrote: > > I'd actually say the proper fix would be to instead move the truncation > > to *after* finishing updating relfrozenxid etc. If we truncate, the > > additional cost of another in-place pg_class update, to update relpages, > > is basically insignificant. And the risk of errors, or being cancelled, > > during truncation is much higher than before (due to the AEL). > > That would prevent the ERROR from impeding relfrozenxid advancement, > but it does not prevent the error itself, nor the XID consumption. If > autovacuum is hitting that ERROR, it will spew errors in the log but > succeed in advancing relfrozenxid anyway. I don't think that's as > nice as the behavior I proposed, but it's certainly better than the > status quo. If you are hitting that error due to a manual VACUUM, > under your proposal, you'll stop the manual VACUUM as soon as you hit > the first table where this happens, which is not what you want. > You'll also keep consuming XIDs, which is not what you want either, > especially if you are in single-user mode because the number of > remaining XIDs is less that a million. Part of my opposition to just disabling it when close to a wraparound, is that it still allows to get close to wraparound because of truncation issues. IMO preventing getting closer to wraparound is more important than making it more "comfortable" to be in a wraparound situation. The second problem I see is that even somebody close to a wraparound might have an urgent need to free up some space. So I'm a bit wary of just disabling it. Wonder if there's a reasonable way that'd allow to do the WAL logging for the truncation without using an xid. One way would be to just get rid of the lock on the primary as previously discussed. But we could also drive the locking through the WAL records that do the actual truncation - then there'd not be a need for an xid. It's probably not a entirely trivial change, but I don't think it'd be too bad? > > > Also, I think that old prepared transactions and stale replication > > > slots should be emphasized more prominently. Maybe something like: > > > > > > HINT: Commit or roll back old prepared transactions, drop stale > > > replication slots, or kill long-running sessions. > > > Ensure that autovacuum is progressing, or run a manual database-wide VACUUM. > > > > I think it'd be good to instead compute what the actual problem is. It'd > > not be particularly hard to show some these in the errdetail: > > > > 1) the xid horizon (xid + age) of the problematic database; potentially, > > if connected to that database, additionally compute what the oldest > > xid is (although that's computationally potentially too expensive) s/oldest xid/oldest relfrozenxid/ > > 2) the xid horizon (xid + age) due to prepared transactions, and the > > oldest transaction's name > > 3) the xid horizon (xid + age) due to replication slot, and the "oldest" > > slot's name > > 4) the xid horizon (xid + age) and pid for the connection with the > > oldest snapshot. > > > > I think that'd allow users much much easier to pinpoint what's going on. > > > > In fact, I think we probably should additionally add a function that can > > display the above. That'd make it much easier to write monitoring > > queries. > > I think that the error and hint that you get from > GetNewTransactionId() has to be something that we can generate very > quickly, without doing anything that might hang on cluster with lots > of databases or lots of relations; but if there's useful detail we can > display there, that's good. With a view, it's more OK if it takes a > long time on a big cluster. Yea, I agree it has to be reasonably fast. But all of the above, with the exception of the optional "oldest table", should be cheap enough to compute. Sure, a scan through PGXACT isn't cheap, but in comparison to an ereport() and an impending shutdown it's peanuts. In contrast to scanning a pg_class that could be many many gigabytes. Greetings, Andres Freund
On Fri, May 3, 2019 at 8:45 PM Andres Freund <andres@anarazel.de> wrote: > Part of my opposition to just disabling it when close to a wraparound, > is that it still allows to get close to wraparound because of truncation > issues. Sure ... it would definitely be better if vacuum didn't consume XIDs when it truncates. On the other hand, only a minority of VACUUM operations will truncate, so I don't think there's really a big problem in practice here. > IMO preventing getting closer to wraparound is more important > than making it more "comfortable" to be in a wraparound situation. I think that's a false dichotomy. It's impossible to create a situation where no user ever gets into a wraparound situation, unless we're prepared to do things like automatically drop replication slots and automatically roll back (or commit?) prepared transactions. So, while it is good to prevent a user from getting into a wraparound situation where we can, it is ALSO good to make it easy to recover from those situations as painlessly as possible when they do happen. > The second problem I see is that even somebody close to a wraparound > might have an urgent need to free up some space. So I'm a bit wary of > just disabling it. I would find that ... really surprising. If you have < 11 million XIDs left before your data gets eaten by a grue, and you file a bug report complaining that vacuum won't truncate your tables until you catch up on vacuuming a bit, I am prepared to offer you no sympathy at all. I mean, I'm not going to say that we couldn't invent more complicated behavior, at least on master, like making the new VACUUM (TRUNCATE) object ternary-valued: default is on when you have more than 11 million XIDs remaining and off otherwise, but you can override either value by saying VACUUM (TRUNCATE { ON | OFF }). But is that really a thing? People have less than 11 million XIDs left and they're like "forget anti-wraparound, I want to truncate away some empty pages"? > Wonder if there's a reasonable way that'd allow to do the WAL logging > for the truncation without using an xid. One way would be to just get > rid of the lock on the primary as previously discussed. But we could > also drive the locking through the WAL records that do the actual > truncation - then there'd not be a need for an xid. It's probably not a > entirely trivial change, but I don't think it'd be too bad? Beats me. For me, this is just a bug, not an excuse to redesign vacuum truncation. Before Hot Standby, when you got into severe wraparound trouble, you could vacuum all your tables without consuming any XIDs. Now you can't. That's bad, and I think we should come up with some kind of back-patchable solution to that problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > I am not sure exactly how to fix this, > because the calculation we use to determine the XID that can be used > to vacuum a specific table is pretty complex; how can the postmaster > know whether it's going to be able to make any progress in *any* table > in some database to which it's not even connected? But it's surely > crazy to just keep doing something over and over that can't possibly > work. I definitely agree that it's foolish to keep doing something that isn't going to work, and it seems like a pretty large part of the issue here is that we don't have enough information to be more intelligent because we aren't connected to the database that needs the work to be done. Now, presuming we're talking about 'new feature work' here to try and address this, and not something that we think we can back-patch, I had another thought. I certainly get that having lots of extra processes around can be a waste of resources... but I don't recall a lot of people complaining about the autovacuum launcher process using up lots of resources unnecessairly. Perhaps we should consider having the 'global' autovacuum launcher, when it's decided that a database needs work to be done, launch a 'per-DB' launcher which manages launching autovacuum processes for that database? If the launcher is still running then it's because there's still work to be done on that database and the 'global' autovacuum launcher can skip it. If the 'per-DB' launcher runs out of things to do, and the database it's working on is no longer in a danger zone, then it exits. There are certainly some other variations on this idea and I don't know that it's really better than keeping more information in shared memory or something else, but it seems like part of the issue is that the thing firing off the processes hasn't got enough info to do so intelligently and maybe we could fix that by having per-DB launchers that are actually connected to a DB. > 2. Once you get to the point where you start to emit errors when > attempting to assign an XID, you can still run plain old VACUUM > because it doesn't consume an XID ... except that if it tries to > truncate the relation, then it will take AccessExclusiveLock, which > has to be logged, which forces an XID assignment, which makes VACUUM > fail. So if you burn through XIDs until the system gets to this > point, and then you roll back the prepared transaction that caused the > problem in the first place, autovacuum sits there trying to vacuum > tables in a tight loop and fails over and over again as soon as hits a > table that it thinks needs to be truncated. This seems really lame, > and also easily fixed. > > Attached is a patch that disables vacuum truncation if xidWarnLimit > has been reached. With this patch, in my testing, autovacuum is able > to recover the system once the prepared transaction has been rolled > back. Without this patch, not only does that not happen, but if you > had a database with enough relations that need truncation, you could > conceivably cause XID wraparound just from running a database-wide > VACUUM, the one tool you have available to avoid XID wraparound. I > think that this amounts to a back-patchable bug fix. Sounds reasonable to me but I've not looked at the patch at all. > 3. The message you get when you hit xidStopLimit seems like bad advice to me: > > ERROR: database is not accepting commands to avoid wraparound data > loss in database "%s" > HINT: Stop the postmaster and vacuum that database in single-user mode. > You might also need to commit or roll back old prepared transactions, > or drop stale replication slots. > > Why do we want people to stop the postmaster and vacuum that database > in single user mode? Why not just run VACUUM in multi-user mode, or > let autovacuum take care of the problem? Granted, if VACUUM is going > to fail in multi-user mode, and if switching to single-user mode is > going to make it succeed, then it's a good suggestion. But it seems > that it doesn't fail in multi-user mode, unless it tries to truncate > something, which is a bug we should fix. Telling people to go to > single-user mode where they can continue to assign XIDs even though > they have almost no XIDs left seems extremely dangerous, actually. > > Also, I think that old prepared transactions and stale replication > slots should be emphasized more prominently. Maybe something like: > > HINT: Commit or roll back old prepared transactions, drop stale > replication slots, or kill long-running sessions. > Ensure that autovacuum is progressing, or run a manual database-wide VACUUM. I agree that a better message would definitely be good and that recommending single-user isn't a terribly useful thing to do. I might have misunderstood it, but it sounded like Andres was proposing a new function which would basically tell you what's holding back the xid horizon and that sounds fantastic and would be great to include in this message, if possible. As in: HINT: Run the function pg_what_is_holding_xmin_back() to identify what is preventing autovacuum from progressing and address it. Or some such. Thanks, Stephen
Attachment
Hi, On 2019-05-03 21:36:24 -0400, Robert Haas wrote: > On Fri, May 3, 2019 at 8:45 PM Andres Freund <andres@anarazel.de> wrote: > > Part of my opposition to just disabling it when close to a wraparound, > > is that it still allows to get close to wraparound because of truncation > > issues. > > Sure ... it would definitely be better if vacuum didn't consume XIDs > when it truncates. On the other hand, only a minority of VACUUM > operations will truncate, so I don't think there's really a big > problem in practice here. I've seen a number of out-of-xid shutdowns precisely because of truncations. Initially because autovacuum commits suicide because somebody else wants a conflicting lock, later because there's so much dead space that people kill (auto)vacuum to get rid of the exclusive locks. > > IMO preventing getting closer to wraparound is more important > > than making it more "comfortable" to be in a wraparound situation. > > I think that's a false dichotomy. It's impossible to create a > situation where no user ever gets into a wraparound situation, unless > we're prepared to do things like automatically drop replication slots > and automatically roll back (or commit?) prepared transactions. So, > while it is good to prevent a user from getting into a wraparound > situation where we can, it is ALSO good to make it easy to recover > from those situations as painlessly as possible when they do happen. Sure, but I've seen a number of real-world cases of xid wraparound shutdowns related to truncations, and no real world problem due to truncations assigning an xid. > > The second problem I see is that even somebody close to a wraparound > > might have an urgent need to free up some space. So I'm a bit wary of > > just disabling it. > > I would find that ... really surprising. If you have < 11 million > XIDs left before your data gets eaten by a grue, and you file a bug > report complaining that vacuum won't truncate your tables until you > catch up on vacuuming a bit, I am prepared to offer you no sympathy at > all. I've seen wraparound issues triggered by auto-vacuum generating so much WAL that the system ran out of space, crash-restart, repeat. And being unable to reclaim space could make that even harder to tackle. > > Wonder if there's a reasonable way that'd allow to do the WAL logging > > for the truncation without using an xid. One way would be to just get > > rid of the lock on the primary as previously discussed. But we could > > also drive the locking through the WAL records that do the actual > > truncation - then there'd not be a need for an xid. It's probably not a > > entirely trivial change, but I don't think it'd be too bad? > > Beats me. For me, this is just a bug, not an excuse to redesign > vacuum truncation. Before Hot Standby, when you got into severe > wraparound trouble, you could vacuum all your tables without consuming > any XIDs. Now you can't. That's bad, and I think we should come up > with some kind of back-patchable solution to that problem. I agree we need to do at least a minimal version that can be backpatched. I don't think we necessarily need a new WAL record for what I'm describing above (as XLOG_SMGR_TRUNCATE already carries information about which forks are truncated, we could just have it acquire the exclusive lock), and I don't think we'd need a ton of code for eliding the WAL logged lock either. Think the issue with backpatching would be that we can't remove the logged lock, without creating hazards for standbys running older versions of postgres. Greetings, Andres Freund
Greetings, * Andres Freund (andres@anarazel.de) wrote: > I don't think we necessarily need a new WAL record for what I'm > describing above (as XLOG_SMGR_TRUNCATE already carries information > about which forks are truncated, we could just have it acquire the > exclusive lock), and I don't think we'd need a ton of code for eliding > the WAL logged lock either. Think the issue with backpatching would be > that we can't remove the logged lock, without creating hazards for > standbys running older versions of postgres. While it's pretty rare, I don't believe this would be the only case of "you need to upgrade your replicas before your primary" due to changes in WAL. Of course, we need to make sure that we actually figure out that the WAL being sent is something that the replica doesn't know how to properly handle because it's from a newer primary; we can't simply do the wrong thing in that case. Thanks, Stephen
Attachment
Hi, On 2019-05-03 22:11:08 -0400, Stephen Frost wrote: > * Andres Freund (andres@anarazel.de) wrote: > > I don't think we necessarily need a new WAL record for what I'm > > describing above (as XLOG_SMGR_TRUNCATE already carries information > > about which forks are truncated, we could just have it acquire the > > exclusive lock), and I don't think we'd need a ton of code for eliding > > the WAL logged lock either. Think the issue with backpatching would be > > that we can't remove the logged lock, without creating hazards for > > standbys running older versions of postgres. > > While it's pretty rare, I don't believe this would be the only case of > "you need to upgrade your replicas before your primary" due to changes > in WAL. Of course, we need to make sure that we actually figure out > that the WAL being sent is something that the replica doesn't know how > to properly handle because it's from a newer primary; we can't simply do > the wrong thing in that case. Don't think this is severe enough to warrant a step like this. I think for the back-patch case we could live with a) move truncation to after the rest of vacuum b) don't truncate if it'd error out anyway, but log an error Greetings, Andres Freund
Hi, On 2019-05-03 22:03:18 -0400, Stephen Frost wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: > > I am not sure exactly how to fix this, > > because the calculation we use to determine the XID that can be used > > to vacuum a specific table is pretty complex; how can the postmaster > > know whether it's going to be able to make any progress in *any* table > > in some database to which it's not even connected? But it's surely > > crazy to just keep doing something over and over that can't possibly > > work. > > I definitely agree that it's foolish to keep doing something that isn't > going to work, and it seems like a pretty large part of the issue here > is that we don't have enough information to be more intelligent because > we aren't connected to the database that needs the work to be done. > > Now, presuming we're talking about 'new feature work' here to try and > address this, and not something that we think we can back-patch, I had > another thought. > > I certainly get that having lots of extra processes around can be a > waste of resources... but I don't recall a lot of people complaining > about the autovacuum launcher process using up lots of resources > unnecessairly. > > Perhaps we should consider having the 'global' autovacuum launcher, when > it's decided that a database needs work to be done, launch a 'per-DB' > launcher which manages launching autovacuum processes for that database? > If the launcher is still running then it's because there's still work to > be done on that database and the 'global' autovacuum launcher can skip > it. If the 'per-DB' launcher runs out of things to do, and the database > it's working on is no longer in a danger zone, then it exits. > > There are certainly some other variations on this idea and I don't know > that it's really better than keeping more information in shared > memory or something else, but it seems like part of the issue is that > the thing firing off the processes hasn't got enough info to do so > intelligently and maybe we could fix that by having per-DB launchers > that are actually connected to a DB. This sounds a lot more like a wholesale redesign than some small incremental work. Which I think we should do, but we probably ought to do something more minimal before the resources for something like this are there. Perfect being the enemy of the good, and all that. > I might have misunderstood it, but it sounded like Andres was > proposing a new function which would basically tell you what's holding > back the xid horizon and that sounds fantastic and would be great to > include in this message, if possible. > > As in: > > HINT: Run the function pg_what_is_holding_xmin_back() to identify what > is preventing autovacuum from progressing and address it. I was basically thinking of doing *both*, amending the message, *and* having a new UDF. Basically, instead of the current: char *oldest_datname = get_database_name(oldest_datoid); /* complain even if that DB has disappeared */ if (oldest_datname) ereport(WARNING, (errmsg("database \"%s\" must be vacuumed within %u transactions", oldest_datname, xidWrapLimit - xid), errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replicationslots."))); else ereport(WARNING, (errmsg("database with OID %u must be vacuumed within %u transactions", oldest_datoid, xidWrapLimit - xid), errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replicationslots."))); } which is dramatically unhelpful, because often vacuuming won't do squat (because of old [prepared] transaction, replication connection, or slot). I'm thinking that we'd do something roughly like (in actual code) for GetNewTransactionId(): TransactionId dat_limit = ShmemVariableCache->oldestXid; TransactionId slot_limit = Min(replication_slot_xmin, replication_slot_catalog_xmin); Transactionid walsender_limit; Transactionid prepared_xact_limit; Transactionid backend_limit; ComputeOldestXminFromProcarray(&walsender_limit, &prepared_xact_limit, &backend_limit); if (IsOldest(dat_limit)) ereport(elevel, errmsg("close to xid wraparound, held back by database %s"), errdetail("current xid %u, horizon for database %u, shutting down at %u"), errhint("...")); else if (IsOldest(slot_limit)) ereport(elevel, errmsg("close to xid wraparound, held back by replication slot %s"), ...); where IsOldest wouldn't actually compare plainly numerically, but would actually prefer showing the slot, backend, walsender, prepared_xact, as long as they are pretty close to the dat_limit - as in those cases vacuuming wouldn't actually solve the issue, unless the other problems are addressed first (as autovacuum won't compute a cutoff horizon that's newer than any of those). and for the function I was thinking of an SRF that'd return roughly rows like: * "database horizon", xid, age(xid), database oid, database name * "slot horizon", xid, age(xid), NULL, slot name * "backend horizon", xid, age(xid), backend pid, query string? * "walsender horizon", xid, age(xid), backend pid, connection string? * "anti-wraparound-vacuums-start", xid, NULL, NULL * "current xid", xid, NULL, NULL * "xid warn limit", xid, NULL, NULL * "xid shutdown limit", xid, NULL, NULL * "xid wraparound limit", xid, NULL, NULL Not sure if an SRF is really the best approach, but it seems like it'd be the easist way to show a good overview of the problems. I don't know how many times this'd have prevented support escalations, but it'd be many. Greetings, Andres Freund
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2019-05-03 22:03:18 -0400, Stephen Frost wrote: > > * Robert Haas (robertmhaas@gmail.com) wrote: > > > I am not sure exactly how to fix this, > > > because the calculation we use to determine the XID that can be used > > > to vacuum a specific table is pretty complex; how can the postmaster > > > know whether it's going to be able to make any progress in *any* table > > > in some database to which it's not even connected? But it's surely > > > crazy to just keep doing something over and over that can't possibly > > > work. > > > > I definitely agree that it's foolish to keep doing something that isn't > > going to work, and it seems like a pretty large part of the issue here > > is that we don't have enough information to be more intelligent because > > we aren't connected to the database that needs the work to be done. > > > > Now, presuming we're talking about 'new feature work' here to try and > > address this, and not something that we think we can back-patch, I had > > another thought. > > > > I certainly get that having lots of extra processes around can be a > > waste of resources... but I don't recall a lot of people complaining > > about the autovacuum launcher process using up lots of resources > > unnecessairly. > > > > Perhaps we should consider having the 'global' autovacuum launcher, when > > it's decided that a database needs work to be done, launch a 'per-DB' > > launcher which manages launching autovacuum processes for that database? > > If the launcher is still running then it's because there's still work to > > be done on that database and the 'global' autovacuum launcher can skip > > it. If the 'per-DB' launcher runs out of things to do, and the database > > it's working on is no longer in a danger zone, then it exits. > > > > There are certainly some other variations on this idea and I don't know > > that it's really better than keeping more information in shared > > memory or something else, but it seems like part of the issue is that > > the thing firing off the processes hasn't got enough info to do so > > intelligently and maybe we could fix that by having per-DB launchers > > that are actually connected to a DB. > > This sounds a lot more like a wholesale redesign than some small > incremental work. Which I think we should do, but we probably ought to > do something more minimal before the resources for something like this > are there. Perfect being the enemy of the good, and all that. I suppose it is a pretty big change in the base autovacuum launcher to be something that's run per database instead and then deal with the coordination between the two... but I can't help but feel like it wouldn't be that much *work*. I'm not against doing something smaller but was something smaller actually proposed for this specific issue..? > > I might have misunderstood it, but it sounded like Andres was > > proposing a new function which would basically tell you what's holding > > back the xid horizon and that sounds fantastic and would be great to > > include in this message, if possible. > > > > As in: > > > > HINT: Run the function pg_what_is_holding_xmin_back() to identify what > > is preventing autovacuum from progressing and address it. > > I was basically thinking of doing *both*, amending the message, *and* > having a new UDF. > > Basically, instead of the current: > > char *oldest_datname = get_database_name(oldest_datoid); > > /* complain even if that DB has disappeared */ > if (oldest_datname) > ereport(WARNING, > (errmsg("database \"%s\" must be vacuumed within %u transactions", > oldest_datname, > xidWrapLimit - xid), > errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" > "You might also need to commit or roll back old prepared transactions, or drop stale replicationslots."))); > else > ereport(WARNING, > (errmsg("database with OID %u must be vacuumed within %u transactions", > oldest_datoid, > xidWrapLimit - xid), > errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" > "You might also need to commit or roll back old prepared transactions, or drop stale replicationslots."))); > } > > which is dramatically unhelpful, because often vacuuming won't do squat > (because of old [prepared] transaction, replication connection, or slot). > > I'm thinking that we'd do something roughly like (in actual code) for > GetNewTransactionId(): > > TransactionId dat_limit = ShmemVariableCache->oldestXid; > TransactionId slot_limit = Min(replication_slot_xmin, replication_slot_catalog_xmin); > Transactionid walsender_limit; > Transactionid prepared_xact_limit; > Transactionid backend_limit; > > ComputeOldestXminFromProcarray(&walsender_limit, &prepared_xact_limit, &backend_limit); > > if (IsOldest(dat_limit)) > ereport(elevel, > errmsg("close to xid wraparound, held back by database %s"), > errdetail("current xid %u, horizon for database %u, shutting down at %u"), > errhint("...")); > else if (IsOldest(slot_limit)) > ereport(elevel, errmsg("close to xid wraparound, held back by replication slot %s"), > ...); > > where IsOldest wouldn't actually compare plainly numerically, but would > actually prefer showing the slot, backend, walsender, prepared_xact, as > long as they are pretty close to the dat_limit - as in those cases > vacuuming wouldn't actually solve the issue, unless the other problems > are addressed first (as autovacuum won't compute a cutoff horizon that's > newer than any of those). Where the errhint() above includes a recommendation to run the SRF described below, I take it? Also, should this really be an 'else if', or should it be just a set of 'if()'s, thereby giving users more info right up-front? If there's one thing I quite dislike, it's cases where you're playing whack-a-mole over and over because the system says "X is bad!" and when you fix X, it just turns around and says "Y is bad!", even though it had all the info it needed to say "X and Y are bad!" right up-front. > and for the function I was thinking of an SRF that'd return roughly rows > like: > * "database horizon", xid, age(xid), database oid, database name > * "slot horizon", xid, age(xid), NULL, slot name > * "backend horizon", xid, age(xid), backend pid, query string? > * "walsender horizon", xid, age(xid), backend pid, connection string? > * "anti-wraparound-vacuums-start", xid, NULL, NULL > * "current xid", xid, NULL, NULL > * "xid warn limit", xid, NULL, NULL > * "xid shutdown limit", xid, NULL, NULL > * "xid wraparound limit", xid, NULL, NULL > > Not sure if an SRF is really the best approach, but it seems like it'd > be the easist way to show a good overview of the problems. I don't know > how many times this'd have prevented support escalations, but it'd be > many. Yeah, I'm not sure if an SRF makes the most sense here... but I'm also not sure that it's a bad idea either. Once it's written, it'll be a lot easier to critique the specific interface. ;) Thanks! Stephen
Attachment
Hi, On 2019-05-03 22:41:11 -0400, Stephen Frost wrote: > I suppose it is a pretty big change in the base autovacuum launcher to > be something that's run per database instead and then deal with the > coordination between the two... but I can't help but feel like it > wouldn't be that much *work*. I'm not against doing something smaller > but was something smaller actually proposed for this specific issue..? I think it'd be fairly significant. And that we should redo it from scratch if we go there - because what we have isn't worth using as a basis. > > I'm thinking that we'd do something roughly like (in actual code) for > > GetNewTransactionId(): > > > > TransactionId dat_limit = ShmemVariableCache->oldestXid; > > TransactionId slot_limit = Min(replication_slot_xmin, replication_slot_catalog_xmin); > > Transactionid walsender_limit; > > Transactionid prepared_xact_limit; > > Transactionid backend_limit; > > > > ComputeOldestXminFromProcarray(&walsender_limit, &prepared_xact_limit, &backend_limit); > > > > if (IsOldest(dat_limit)) > > ereport(elevel, > > errmsg("close to xid wraparound, held back by database %s"), > > errdetail("current xid %u, horizon for database %u, shutting down at %u"), > > errhint("...")); > > else if (IsOldest(slot_limit)) > > ereport(elevel, errmsg("close to xid wraparound, held back by replication slot %s"), > > ...); > > > > where IsOldest wouldn't actually compare plainly numerically, but would > > actually prefer showing the slot, backend, walsender, prepared_xact, as > > long as they are pretty close to the dat_limit - as in those cases > > vacuuming wouldn't actually solve the issue, unless the other problems > > are addressed first (as autovacuum won't compute a cutoff horizon that's > > newer than any of those). > > Where the errhint() above includes a recommendation to run the SRF > described below, I take it? Not necessarily. I feel conciseness is important too, and this would be the most imporant thing to tackle. > Also, should this really be an 'else if', or should it be just a set of > 'if()'s, thereby giving users more info right up-front? Possibly? But it'd also make it even harder to read the log / the system to keep up with logging, because we already log *so* much when close to wraparound. If we didn't order it, it'd be hard for users to figure out which to address first. If we ordered it, people have to further up in the log to figure out which is the most urgent one (unless we reverse the order, which is odd too). Greetings, Andres Freund
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2019-05-03 22:41:11 -0400, Stephen Frost wrote: > > I suppose it is a pretty big change in the base autovacuum launcher to > > be something that's run per database instead and then deal with the > > coordination between the two... but I can't help but feel like it > > wouldn't be that much *work*. I'm not against doing something smaller > > but was something smaller actually proposed for this specific issue..? > > I think it'd be fairly significant. And that we should redo it from > scratch if we go there - because what we have isn't worth using as a > basis. Alright, what I'm hearing here is that we should probably have a dedicated thread for this discussion, if someone has the cycles to spend on it. I'm not against that. > > > I'm thinking that we'd do something roughly like (in actual code) for > > > GetNewTransactionId(): > > > > > > TransactionId dat_limit = ShmemVariableCache->oldestXid; > > > TransactionId slot_limit = Min(replication_slot_xmin, replication_slot_catalog_xmin); > > > Transactionid walsender_limit; > > > Transactionid prepared_xact_limit; > > > Transactionid backend_limit; > > > > > > ComputeOldestXminFromProcarray(&walsender_limit, &prepared_xact_limit, &backend_limit); > > > > > > if (IsOldest(dat_limit)) > > > ereport(elevel, > > > errmsg("close to xid wraparound, held back by database %s"), > > > errdetail("current xid %u, horizon for database %u, shutting down at %u"), > > > errhint("...")); > > > else if (IsOldest(slot_limit)) > > > ereport(elevel, errmsg("close to xid wraparound, held back by replication slot %s"), > > > ...); > > > > > > where IsOldest wouldn't actually compare plainly numerically, but would > > > actually prefer showing the slot, backend, walsender, prepared_xact, as > > > long as they are pretty close to the dat_limit - as in those cases > > > vacuuming wouldn't actually solve the issue, unless the other problems > > > are addressed first (as autovacuum won't compute a cutoff horizon that's > > > newer than any of those). > > > > Where the errhint() above includes a recommendation to run the SRF > > described below, I take it? > > Not necessarily. I feel conciseness is important too, and this would be > the most imporant thing to tackle. I'm imagining a relatively rare scenario, just to be clear, where "pretty close to the dat_limit" would apply to more than just one thing. > > Also, should this really be an 'else if', or should it be just a set of > > 'if()'s, thereby giving users more info right up-front? > > Possibly? But it'd also make it even harder to read the log / the system > to keep up with logging, because we already log *so* much when close to > wraparound. Yes, we definitely log a *lot*, and probably too much since other critical messages might get lost in the noise. > If we didn't order it, it'd be hard for users to figure out which to > address first. If we ordered it, people have to further up in the log to > figure out which is the most urgent one (unless we reverse the order, > which is odd too). This makes me think we should both order it and combine it into one message... but that'd then be pretty difficult to deal with, potentially, from a translation standpoint and just from a "wow, that's a huge log message", which is kind of the idea behind the SRF- to give you all that info in a more easily digestible manner. Not sure I've got any great ideas on how to improve on this. I do think that if we know that there's multiple different things that are within a small number of xids of the oldest xmin then we should notify the user about all of them, either directly in the error messages or by referring them to the SRF, so they have the opportunity to address them all, or at least know about them all. As mentioned though, it's likely to be a quite rare thing to run into, so you'd have to be extra unlucky to even hit this case and perhaps the extra code complication just isn't worth it. Thanks, Stephen
Attachment
Em sex, 3 de mai de 2019 às 17:27, Robert Haas <robertmhaas@gmail.com> escreveu: > > HINT: Commit or roll back old prepared transactions, drop stale > replication slots, or kill long-running sessions. > Ensure that autovacuum is progressing, or run a manual database-wide VACUUM. > First of all, +1 for this patch. But after reading this email, I couldn't resist to expose an idea about stale XID horizon. [A cup of tea...] I realized that there is no automatic way to recover from old prepared transactions, stale replication slots or even long-running sessions when we have a wraparound situation. Isn't it the case to add a parameter that recover from stale XID horizon? I mean if we reach a critical situation (xidStopLimit), free resource that is preventing the XID advance (and hope autovacuum have some time to prevent a stop-assigning-xids situation). The situation is analogous to OOM killer that it kills some processes that are starving resources. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Hi, On 2019-05-03 23:08:44 -0400, Stephen Frost wrote: > This makes me think we should both order it and combine it into one > message... but that'd then be pretty difficult to deal with, > potentially, from a translation standpoint and just from a "wow, that's > a huge log message", which is kind of the idea behind the SRF- to give > you all that info in a more easily digestible manner. > > Not sure I've got any great ideas on how to improve on this. I do think > that if we know that there's multiple different things that are within a > small number of xids of the oldest xmin then we should notify the user > about all of them, either directly in the error messages or by referring > them to the SRF, so they have the opportunity to address them all, or > at least know about them all. As mentioned though, it's likely to be a > quite rare thing to run into, so you'd have to be extra unlucky to even > hit this case and perhaps the extra code complication just isn't worth > it. I think just having an actual reason for the problem would be so much better than the current status, that I'd tackle the "oops, just about everything is screwed, here's the reasons in order" case separately (or just plain never). Greetings, Andres Freund
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2019-05-03 23:08:44 -0400, Stephen Frost wrote: > > As mentioned though, it's likely to be a > > quite rare thing to run into, so you'd have to be extra unlucky to even > > hit this case and perhaps the extra code complication just isn't worth > > it. > > I think just having an actual reason for the problem would be so much > better than the current status, that I'd tackle the "oops, just about > everything is screwed, here's the reasons in order" case separately (or > just plain never). I certainly agree with that, but if we really think it's so rare that we don't feel that it's worth worrying about then I'd say we should remove the 'else' in the 'else if', as I initially suggested, since the chances of users getting more than one is quite rare and more than two would be.. impressive, which negates the concern you raised earlier about there being a bunch of these messages that make it hard for users to reason about what is the most critical. Removing the 'else' would make it strictly less code (albeit not much, but it certainly isn't adding to the code) and would be more informative for the user who does end up having two cases happen at (or nearly) the same time. Thanks, Stephen