autovacuum prioritization - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | autovacuum prioritization |
Date | |
Msg-id | CA+TgmoafJPjB3WVqB3FrGWUU4NLRc3VHx8GXzLL-JM++JPwK+Q@mail.gmail.com Whole thread Raw |
Responses |
Re: autovacuum prioritization
Re: autovacuum prioritization Re: autovacuum prioritization |
List | pgsql-hackers |
autovacuum, as it exists today, has some reasonable - though imperfect - heuristics for deciding which tables need to be vacuumed or analyzed. However, it has nothing to help it decide which tables need to be vacuumed or analyzed first. As a result, when it's configured aggressively enough, and has sufficient hardware resources, to do all the things that need to get done, the system holds together reasonably well. When it's not configured sufficiently aggressively, or when hardware resources are lacking, the situation deteriorates. Some deterioration is inevitable in that kind of scenario, because if you can't do all of the things that need to be done, something will have to get postponed, and that can cause problems. However, autovacuum makes it worse than it needs to be by deciding which things to do first more or less at random. When autovacuum chooses to vacuum a table that is a little bit bloated in preference to one that is severely bloated, or one that is bloated in place of one that is about to cause a wraparound shutdown, users are understandably upset, and some of them post on this mailing list about it. I am not going to try to review in detail the history of previous threads looking for action in this area, but there are quite a few. In my view, previous efforts in this area have been too simplistic. For example, it's been proposed that a table that is perceived to be in any kind of wraparound danger ought to get top priority, but I find that implausible. A mostly-quiescent table that is one XID past autovacuum_freeze_max_age is not likely to be a more urgent problem than a table that is absorbing 20,000 row updates per second. There is a long way between the default value of autovacuum_freeze_max_age and a wraparound shutdown, but if you don't vacuum a hotly-updated table promptly, you will get irreversible bloat and force an application shutdown to run VACUUM FULL. It's also been proposed that we provide some way to let the user fix priorities, which was rightly criticized as foisting on the user what the system ought to be figuring out for itself. Perhaps a user-controllable prioritization system is a good idea, but only as a way of overriding some built-in system in cases where that goes wrong. A few people have proposed scoring systems, which I think is closer to the right idea, because our basic goal is to start vacuuming any given table soon enough that we finish vacuuming it before some catastrophe strikes. The more imminent the catastrophe, the more urgent it is to start vacuuming right away. Also, and I think this is very important, the longer vacuum is going to take, the more urgent it is to start vacuuming right away. If table A will cause wraparound in 2 hours and take 2 hours to vacuum, and table B will cause wraparound in 1 hour and take 10 minutes to vacuum, table A is more urgent even though the catastrophe is further out. So at a high level, I think that what we ought to do is, first, for each table, estimate the time at which we think something bad will occur. There are several bad events: too much bloat, XID wraparound, MXID wraparound. We need to estimate the time at which we think each of those things will occur, and then take the earliest of those estimates. That's the time by which we need to have finished vacuuming the table. Then, make an estimate of how long it will take to complete a vacuum of the table, subtract that from the time at which we need to be done, and that's the time by which we need to start. The earliest need-to-start time is the highest priority table. There are a number of problems here. One is that we actually need to be able to estimate all the things that I just described, which will probably require tracking statistics that we don't capture today, such as the rate at which the system is consuming XIDs, and the rate at which a table is bloating, rather than just the current state of things. Unfortunately, proposing that the statistics system should store more per-table information is a well-known way to get your patch forcefully rejected. Maybe if we were storing and updating the statistics data in a better way it wouldn't be an issue - so perhaps shared memory stats collector stuff will resolve this issue somehow. Or maybe it's not an issue anyway, since the big problem with the statistics files is that they have to be constantly rewritten. If we took snapshots of the values in even a relatively dumb way, they'd be kinda bug, but they'd also be write-once. Maybe that would keep the expense reasonable. A second problem is that, if the earliest need-to-start time is in the past, then we definitely are in trouble and had better get to work at once, but if it's in the future, that doesn't necessarily mean we're safe. If there are three tables with a need-to-finish time that is 12 hours in the future and each of them will take 11 hours to vacuum, then every need-to-start time computed according to the algorithm above is in the future, but in fact we're in a lot of trouble. If the estimates are accurate, we need 3 autovacuum workers to be available to start within 1 hour, or we're doomed. The very last thing we want to do is wait another hour before doing anything. It's not impossible to factor this into the calculation of need-to-start times, assuming we know how many workers we have. For instance, if we've got tables whose need-to-finish times are 30, 50, and 70 minutes in the future, we can see that if each one takes 20 minutes or less to vacuum, then the need-to-start times can just be computed by subtraction. But the tables with 50 or 70 minute deadlines are going to take more than 20 minutes to vacuum, then we've got to back up the need-to-start times so that we finish each table in time to start on the next one. I haven't looked into what algorithms exist for this kind of scheduling problem, but I feel like a literature search, or pulling out my college textbooks, would probably turn up some useful ideas. As soon as you see that you can't decide when you need to start on a particular table without knowing what's going on with all the other tables on the system, a third problem becomes apparent: you can't figure anything out with confidence by looking at a single database, but must rather gather information from all databases and decide to which databases the workers should connect and what they ought to do when they get there. And then, as tasks finish and system activity progresses, you need to continuously update your notion of what needs to be done next and move workers around to accommodate it. This gets quite complicated, but that doesn't mean that it's unimportant. There's a pretty well-known "thundering herd" type effect when every table in the system crosses autovacuum_freeze_max_age around the same time, and suddenly we go from not much vacuuming to a whole ton of vacuuming all at once. A system like this could give us enough information to spread that out in an intelligent way: we could see the storm coming and start a single autovacuum worker working on the problem well in advance, and then ramp up to multiple workers only if it looks like that's not going to be enough to get the job done. I'm not sure I want to deal with all that complexity on day one, but I think it's important to do something about it eventually. In the meantime, I think a sensible place to start would be to figure out some system that makes sensible estimates of how soon we need to address bloat, XID wraparound, and MXID wraparound for each table, and some system that estimates how long each one will take to vacuum. Then, even if the workers aren't talking to each other, each individual worker can make an effort to deal with the most urgent tasks first. I think that estimating how long it will take to vacuum a table shouldn't be too bad: examining the visibility map and the index sizes and thinking about the autovacuum cost limits should give us at least some notion of what's likely to happen. Also, I think estimating the time until XID age becomes critical isn't too bad either. First, fix a threshold (perhaps autovacuum_max_freeze_age, maybe with a higher-than-current value, or maybe some other threshold entirely) that represents the target below which we always want to remain. Second, know something about how fast the system is consuming XIDs. Then just divide. I thought for a while that it would be too hard to understand the XID consumption rate, because it might be very uneven, but I think we can mitigate that problem somewhat by averaging over relatively long time intervals. For instance, we might measure the number of XIDs consumed in each 24 hour period, keep 7 days of data, and then take the average of those values, or maybe better, the highest or second-highest. That's a very small amount of data to store and in combination with the relfrozenxid for each table, it's all we need. We could also give the user a way to override our estimate of the XID consumption rate, in case they have very brief, very severe spikes. All of this can also be done for MXID age. It's estimating that the time at which table bloat will exceed some threshold that seems most difficult, because that seems to require measuring trends on a per-table basis, as opposed to the XID consumption rate, which is global. I know that this email is kind of a giant wall of text, so my thanks if you've read this far, and even more if you feel inspired to write back with your own thoughts. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: