Thread: BUG #13002: VACUUM to prevent wraparound blocks TRUNCATE
The following bug has been logged on the website: Bug reference: 13002 Logged by: Maciek Sakrejda Email address: maciek@heroku.com PostgreSQL version: 9.3.6 Operating system: Ubuntu 14.04 Description: If autovacuum is running a VACUUM to prevent wraparound on a certain table, it looks like it blocks TRUNCATE of that table, which would obviate the need for the VACUUM in the first place (and could happen much more quickly). This seems like a usability wart at best--one has to instead kill the autovacuum (e.g., via pg_cancel_backend) and then TRUNCATE. This is especially inconvenient when the user trying to truncate does not have permission to kill (or even see!) the wraparound VACUUM. Would it be possible to have autovacuum yield to a TRUNCATE in this situation?
On Wed, Apr 8, 2015 at 06:41:55PM +0000, maciek@heroku.com wrote: > The following bug has been logged on the website: > > Bug reference: 13002 > Logged by: Maciek Sakrejda > Email address: maciek@heroku.com > PostgreSQL version: 9.3.6 > Operating system: Ubuntu 14.04 > Description: > > If autovacuum is running a VACUUM to prevent wraparound on a certain table, > it looks like it blocks TRUNCATE of that table, which would obviate the need > for the VACUUM in the first place (and could happen much more quickly). This > seems like a usability wart at best--one has to instead kill the autovacuum > (e.g., via pg_cancel_backend) and then TRUNCATE. This is especially > inconvenient when the user trying to truncate does not have permission to > kill (or even see!) the wraparound VACUUM. Would it be possible to have > autovacuum yield to a TRUNCATE in this situation? My guess is this would be hard to do. VACUUM and TRUNCATE request conflicting locks, so you would have to embed in the locking code that the _outcome_ of one should allow it to override the lock of the other. I can't think of anywhere that we allow that. Frankly, you can't really just ignore the lock --- you would have to cancel the vacuum by sending a signal or something. I know we have cases where vacuum aborts if it can't get a lock or something, but I can't remembrer any case where we externally cancel vacuum like that. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
maciek@heroku.com wrote: > If autovacuum is running a VACUUM to prevent wraparound on a certain table, > it looks like it blocks TRUNCATE of that table, which would obviate the need > for the VACUUM in the first place (and could happen much more quickly). This > seems like a usability wart at best--one has to instead kill the autovacuum > (e.g., via pg_cancel_backend) and then TRUNCATE. If you had truncated the table before the issue became an emergency, vacuum would have gave way to truncate. But when autovacuum sees that the problem is too serious to ignore, it doesn't give way to anything; if it did, it would fall prey of the same problem it's trying to avoid. > This is especially inconvenient when the user trying to truncate does > not have permission to kill (or even see!) the wraparound VACUUM. > Would it be possible to have autovacuum yield to a TRUNCATE in this > situation? Sounds like we would have to special-case truncate in the deadlock checker or something like that. Doesn't seem particularly palatable. The way this works is that deadlock checker returns "you're blocked but the culprit is autovacuum" when not in emergency mode; then the other process sends a signal to autovac which commits suicide. To fix this case we would have to add another special code "you're blocked but the culprit is an emergency vacuum", then the other process sees itself as truncate then sends signal anyway. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Apr 8, 2015 at 12:21 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > maciek@heroku.com wrote: > > > If autovacuum is running a VACUUM to prevent wraparound on a certain > table, > > it looks like it blocks TRUNCATE of that table, which would obviate the > need > > for the VACUUM in the first place (and could happen much more quickly). > This > > seems like a usability wart at best--one has to instead kill the > autovacuum > > (e.g., via pg_cancel_backend) and then TRUNCATE. > > If you had truncated the table before the issue became an emergency, > vacuum would have gave way to truncate. But when autovacuum sees that > the problem is too serious to ignore, it doesn't give way to anything; > if it did, it would fall prey of the same problem it's trying to avoid. > > > This is especially inconvenient when the user trying to truncate does > > not have permission to kill (or even see!) the wraparound VACUUM. > > Would it be possible to have autovacuum yield to a TRUNCATE in this > > situation? > > Sounds like we would have to special-case truncate in the deadlock > checker or something like that. Doesn't seem particularly palatable. > > The way this works is that deadlock checker returns "you're blocked but > the culprit is autovacuum" when not in emergency mode; then the other > process sends a signal to autovac which commits suicide. To fix this > case we would have to add another special code "you're blocked but the > culprit is an emergency vacuum", then the other process sees itself as > truncate then sends signal anyway. > =E2=80=8BRelated question: does a truncated table effectively reset its wra= paround counter back to zero? Put another way: what is the use case we are trying=E2=80=8B =E2=80=8Bto support here? If the table in question is being truncated regularly it doesn't seem like this situation should arise. If it is not a regularly truncated table the possibility of a random truncation being run simultaneously with a wraparound VACUUM seems quite small. If the truncate is being run administratively then whomever is running that should at least be able to see the wraparound vacuum: whether they can get superuser rights and do something about it is another matter. Even then, supposing that TRUNCATE is a viable means to speed up a wraparound vacuum seems a bit of a stretch in the general case. Since pg_cancel_backend is usable in this situation putting code into core to handle this scenario does not seem like a good idea. David J.
David G. Johnston wrote: > âRelated question: does a truncated table effectively reset its wraparound > counter back to zero? Yes, the relfrozenxid is set to RecentXmin (see ExecuteTruncate). Not quite zero age but unless you have really old transactions around, should be close enough. > If the table in question is being truncated regularly it doesn't seem like > this situation should arise. Correct. -- Ãlvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Apr 8, 2015 at 12:21 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > maciek@heroku.com wrote: > > > If autovacuum is running a VACUUM to prevent wraparound on a certain > table, > > it looks like it blocks TRUNCATE of that table, which would obviate the > need > > for the VACUUM in the first place (and could happen much more quickly). > This > > seems like a usability wart at best--one has to instead kill the > autovacuum > > (e.g., via pg_cancel_backend) and then TRUNCATE. > > If you had truncated the table before the issue became an emergency, > vacuum would have gave way to truncate. It is pretty hard to argue that the passage of 200 million transactions constitutes some kind of emergency. > But when autovacuum sees that > the problem is too serious to ignore, it doesn't give way to anything; > if it did, it would fall prey of the same problem it's trying to avoid. > It doesn't give way to anything, except "autovacuum_vacuum_cost_delay". I think we need to get off of the idea that is somehow an emergency for a database system to have survived 200 million transactions. Or at least, we need to provide a non-emergency way to do necessary routine maintenance. (In this case since a truncation is desired in the first place, you could consider that we do provide a way even if it is only available in hindsight, but in general we do not.) Right now we declare it an emergency that some guy needs to change his oil, shutdown half the traffic in the city during the "emergency", declare that all the emergency vehicles responding to this emergency are only allowed to move at 1/3 the speed limit, and make it illegal (or at least ineffective) for anyone to change their oil calmly and preemptively. > > This is especially inconvenient when the user trying to truncate does > > not have permission to kill (or even see!) the wraparound VACUUM. > > Would it be possible to have autovacuum yield to a TRUNCATE in this > > situation? > > Sounds like we would have to special-case truncate in the deadlock > checker or something like that. Doesn't seem particularly palatable. > Yeah, it would be a messy case to implement, but if there were a general way to know the locker's intentions there are other places this would come in handy as well. Cheers, Jeff
Jeff Janes wrote: > On Wed, Apr 8, 2015 at 12:21 PM, Alvaro Herrera <alvherre@2ndquadrant.com> > wrote: > > > maciek@heroku.com wrote: > > > > > If autovacuum is running a VACUUM to prevent wraparound on a certain > > table, > > > it looks like it blocks TRUNCATE of that table, which would obviate the > > need > > > for the VACUUM in the first place (and could happen much more quickly). > > This > > > seems like a usability wart at best--one has to instead kill the > > autovacuum > > > (e.g., via pg_cancel_backend) and then TRUNCATE. > > > > If you had truncated the table before the issue became an emergency, > > vacuum would have gave way to truncate. > > It is pretty hard to argue that the passage of 200 million transactions > constitutes some kind of emergency. Well, admittedly 200 million is not a lot of transactions, but the fact remains that it's the configured max freeze age which is what causes the whole problem to start with. If you raise the limit to 2 billion, the same thing will happen, only it will be ten times less frequent. We already have a lower limit to freeze tuples; if tables are scanned in whole every 150 million transactions, as autovacuum would normally do, you wouldn't see an effective shutdown at 200 million transactions. Those 50 million xacts should be enough breathing room, shouldn't they. The issue only arises if you're constantly causing autovac workers to terminate because of DDL or some other processing that locks the table away from vacuum. > > But when autovacuum sees that > > the problem is too serious to ignore, it doesn't give way to anything; > > if it did, it would fall prey of the same problem it's trying to avoid. > > It doesn't give way to anything, except "autovacuum_vacuum_cost_delay". That in itself might be a mistake. Perhaps it makes sense to have emergency vacuums ignore the cost delay, but for that we need to ensure that emergency vacuums are rare events. > I think we need to get off of the idea that is somehow an emergency for a > database system to have survived 200 million transactions. Or at least, we > need to provide a non-emergency way to do necessary routine maintenance. > (In this case since a truncation is desired in the first place, you could > consider that we do provide a way even if it is only available in > hindsight, but in general we do not.) > > Right now we declare it an emergency that some guy needs to change his oil, > shutdown half the traffic in the city during the "emergency", declare that > all the emergency vehicles responding to this emergency are only allowed to > move at 1/3 the speed limit, and make it illegal (or at least ineffective) > for anyone to change their oil calmly and preemptively. Hilarious. But the analogy needs a bit more elaboration in order to explain the current situation. The guy wants to change oil at 150 million transactions, and this would cause no trouble, but every time he pulls over to do so, there's a traffic cop who prevents him from doing so and tells him to get back on the street and move on. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Jeff Janes <jeff.janes@gmail.com> writes: > Right now we declare it an emergency that some guy needs to change his oil, > shutdown half the traffic in the city during the "emergency", declare that > all the emergency vehicles responding to this emergency are only allowed to > move at 1/3 the speed limit, and make it illegal (or at least ineffective) > for anyone to change their oil calmly and preemptively. We could do without hyperbolic and wholly misleading analogies. (In this one, for example, you're omitting that the guy causing the problem is a couple years overdue to have changed his oil, and will blow a gasket and block the entire highway if he drives further.) Getting back to the case at hand, I don't think that making an exception for TRUNCATE would be a good idea even if there were no technical obstacles. It's perfectly possible for a TRUNCATE to be rolled back; so if the TRUNCATE were to kick an emergency autovac off the table lock and then fail, you would end up with no improvement in the situation at all. regards, tom lane
On Wed, Apr 8, 2015 at 1:09 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Jeff Janes wrote: > > On Wed, Apr 8, 2015 at 12:21 PM, Alvaro Herrera < > alvherre@2ndquadrant.com> > > wrote: > > > > > maciek@heroku.com wrote: > > > > > > > If autovacuum is running a VACUUM to prevent wraparound on a certain > > > table, > > > > it looks like it blocks TRUNCATE of that table, which would obviate > the > > > need > > > > for the VACUUM in the first place (and could happen much more > quickly). > > > This > > > > seems like a usability wart at best--one has to instead kill the > > > autovacuum > > > > (e.g., via pg_cancel_backend) and then TRUNCATE. > > > > > > If you had truncated the table before the issue became an emergency, > > > vacuum would have gave way to truncate. > > > > It is pretty hard to argue that the passage of 200 million transactions > > constitutes some kind of emergency. > > Well, admittedly 200 million is not a lot of transactions, but the fact > remains that it's the configured max freeze age which is what causes the > whole problem to start with. If you raise the limit to 2 billion, the > same thing will happen, only it will be ten times less frequent. Right, and this is my gripe. If it were increased to 2 billion, then it expiring actually would be an emergency. At its default value, it is not an emergency, but we act as if it were, and we provide no way to do otherwise. The choices are between suffering true emergencies at 2 billion transactions, or fake emergencies at 200 million (or, as you point out, 150 million). Of course I can pick some value other than 2 billion and 200 million, but there is no good time to have a fake emergency. Or a real one. > We > already have a lower limit to freeze tuples; if tables are scanned in > whole every 150 million transactions, as autovacuum would normally do, > you wouldn't see an effective shutdown at 200 million transactions. > Those 50 million xacts should be enough breathing room, shouldn't they. > But that, too, is an "emergency". As far as I can tell, there is no way to increase the frozen ID without first declaring an emergency. If an ordinary autovac happens to touch every page, it still doesn't increase frozen ID. Cheers, Jeff