Vacuum rate limit in KBps - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Vacuum rate limit in KBps |
Date | |
Msg-id | 4F128D2F.4070200@2ndQuadrant.com Whole thread Raw |
Responses |
Re: Vacuum rate limit in KBps
Re: Vacuum rate limit in KBps Re: Vacuum rate limit in KBps |
List | pgsql-hackers |
So far the reaction I've gotten from my recent submission to make autovacuum log its read/write in MB/s has been rather positive. I've been surprised at the unprecedented (to me at least) amount of backporting onto big production systems it's gotten. There is a whole lot of pent up frustration among larger installs over not having good visibility into how changing cost-based vacuum parameters turns into real-world units. That got me thinking: if MB/s is what everyone wants to monitor, can we provide a UI to set these parameters that way too? The attached patch is a bit rough still, but it does that. The key was recognizing that the cost delay plus cost limit can be converted into an upper limit on cost units per second, presuming the writes themselves are free. If you then also assume the worst case--that everything will end up dirty--by throwing in the block size, too, you compute a maximum rate in MB/s. That represents the fastest you can possibly write. If you then turn that equation around, making the maximum write rate the input, for any given cost delay and dirty page cost you can solve for the cost limit--the parameter in fictitious units everyone hates. It works like this, with the computation internals logged every time they run for now: #vacuum_cost_rate_limit = 4000 # maximum write rate in kilobytes/second LOG: cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20 That's the same cost limit that was there before, except now it's derived from that maximum write rate figure. vacuum_cost_limit is gone as a GUC, replaced with this new vacuum_cost_rate_limit. Internally, vacuum_cost_rate_limit hasn't gone anywhere though. All of the entry points into vacuum and autovacuum derive an internal-only VacuumCostLimit as part of any setup or rebalance operation. But there's no change to underlying cost management code; the cost limit is budgeted and accounted for in exactly the same way as it always was. Why is this set in kilobytes/second rather than using something based on a memory unit? That decision was made after noting these values can also be set in relation options. Making relation options aware of memory unit math seemed ambitious relative to its usefulness, and it's not like KB/s is hard to work with in this context. OK, I lied; technically this is set in kibibytes per second right now. Ran out of energy before I got to confirming that was consistent with all similar GUC settings, will put on my pedantic hat later to check that. One nice thing that falls out of this is that the *vacuum_cost_delay settings essentially turn into a boolean. If the delay is 0, cost limits are off; set it to any other value, and the rate you get is driven almost entirely by vacuum_cost_rate_limit (disclaimer mainly because of issues like sleep time accuracy are possible). You can see that at work in these examples: LOG: cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20 LOG: cost limit=100 based on rate limit=4000 KB/s delay=10 dirty cost=20 LOG: cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20 LOG: cost limit=100 based on rate limit=2000 KB/s delay=20 dirty cost=20 Halve the delay to 10, and the cost limit drops in half too to keep the same I/O rate. Halve the rate limit instead, and the cost limit halves with it. Most sites will never need to change the delay figure from 20ms, they can just focus on tuning the more human-readable rate limit figure instead. The main reason I thought of to keep the delay around as an integer still is sites trying to minimize power use, they might increase it from the normally used 20ms. I'm not as worried about postgresql.conf settings bloat to support a valid edge use case, so long as most sites find a setting unnecessary to tune. And the autovacuum side of cost delay should fall into that category with this change. Here's a full autovacuum log example. This shows how close to the KBps rate the server actually got, along with the autovacuum cost balancing working the same old way (this is after running the boring autovac-big.sql test case attached here too): 2012-01-15 02:10:51.905 EST: LOG: cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20 2012-01-15 02:10:51.906 EST: DEBUG: autovac_balance_cost(pid=13054 db=16384, rel=16444, cost_rate_limit=4000, cost_limit=200, cost_limit_base=200, cost_delay=20) 2012-01-15 02:11:05.127 EST: DEBUG: "t": removed 4999999 row versions in 22124 pages 2012-01-15 02:11:05.127 EST: DEBUG: "t": found 4999999 removable, 5000001 nonremovable row versions in 44248 out of 44248 pages 2012-01-15 02:11:05.127 EST: DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.27s/0.97u sec elapsed 131.73 sec. 2012-01-15 02:11:05.127 EST: LOG: automatic vacuum of table "gsmith.public.t": index scans: 0 pages: 0 removed, 44248 remain tuples: 4999999 removed, 5000001 remain buffer usage: 48253 hits, 40296 misses, 43869 dirtied avg read rate: 2.390 MiB/s, avg write rate: 2.602 MiB/s system usage: CPU 0.27s/0.97u sec elapsed 131.73 sec I think this new setting suggests the recently adding logging is missing a combined I/O figure, something that measures reads + writes over the time period. This is good enough to demonstrate the sort of UI I was aiming for in action though. Administrator says "don't write more than 4MiB/s", and when autovacuum kicks in it averages 2.4 read + 2.6 write. I see this change as taking something that feels like black arts tuning magic now and turning it into a simple interface that's for the most part intuitive. None of the flexibility is lost here: you can still retune the relative dirty vs. miss vs. hit costs, you have the option of reducing the delay to a small value on a busy server where small sleep values are possible. But you don't have to do any of that just to tweak autovacuum up or down at a gross level; you can just turn the simple "at most this much write I/O" knob instead. All implementation notes and concerns from here down. The original cost limit here defaulted to 200 and allowed a range of 1 to 10000. The new default of 4000 show these values need to be 20X as large. The maximum was adjusted to 200000 KBps. Look at that, the maximum rate you can run cost delay vacuum at is 200MB/s; there's another good example of something that used to be mysterious to compute that is obvious now. I didn't adjust the lower limit downward, so it's actually possible to set the new code to only operate at 1/200 the minimum speed you could set before. On the balance this is surely a reduction in foot gun aiming though, and I could make the minimum 200 to eliminate it. Seems a needless detail to worry about. This code is new and just complicated enough that there's surely some edge cases I broke here. In particular I haven't put together a good concurrent autovacuum test yet to really prove all the balancing logic still works correctly. Need to test that with a settings change in the middle of a long vacuum too. There's one serious concern I don't have a quick answer to. What do we do with in-place upgrade of relations that specified a custom vacuum_cost_limit? I can easily chew on getting the right logic to convert those to equals in the new setting style, but I am not prepared to go solely on the hook for all in-place upgrade work one might do here. Would this be easiest to handle as one of those dump/restore transformations? My guess is that's more sensible than the alternative of making an on-read converter that only writes in the new format, then worrying about upgrading all old pages before moving forward. While this could be an interesting small test case for that sort of thing, I'd rather not be patient #1 for that part of the long-term in-place upgrade path right now. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
Attachment
pgsql-hackers by date: