Re: Per table autovacuum vacuum cost limit behaviour strange - Mailing list pgsql-hackers

From Gregory Smith
Subject Re: Per table autovacuum vacuum cost limit behaviour strange
Date
Msg-id 541FACFD.7030105@gmail.com
Whole thread Raw
In response to Re: Per table autovacuum vacuum cost limit behaviour strange  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 8/28/14, 12:18 PM, Robert Haas wrote:
> At least in situations that I've encountered, it's typical to be able 
> to determine the frequency with which a given table needs to be 
> vacuumed to avoid runaway bloat, and from that you can work backwards 
> to figure out how fast you must process it in MB/s, and from there you 
> can work backwards to figure out what cost delay will achieve that 
> effect. But if the system tinkers with the cost delay under the hood, 
> then you're vacuuming at a different (slower) rate and, of course, the 
> table bloats.
The last time I took a whack at this, I worked toward making all of the 
parameters operate in terms of target MB/s, for exactly this style of 
thinking and goal.  Those converted into the same old mechanism under 
the hood and I got the math right to give the same behavior for the 
simple cases, but that could have been simplified eventually.  I 
consider that line of thinking to be the only useful one here.

The answer I like to these values that don't inherit as expected in the 
GUC tree is to nuke that style of interface altogether in favor of 
simplifer bandwidth measured one, then perhaps add multiple QoS levels.  
Certainly no interest in treating the overly complicated innards of cost 
computation as a bug and fixing them with even more complicated behavior.

The part of this I was trying hard to find time to do myself by the next 
CF was a better bloat measure tool needed to actually see the problem 
better.  With that in hand, and some nasty test cases, I wanted to come 
back to simplified MB/s vacuum parameters with easier to understand 
sharing rules again.  If other people are hot to go on that topic, I 
don't care if I actually do the work; I just have a pretty clear view of 
what I think people want.

> The only plausible use case for setting a per-table rate that I can 
> see is when you actually want the system to use that exact rate for 
> that particular table. That's the main one, for these must run on 
> schedule or else jobs.
Yes.

On 8/29/14, 9:45 AM, Alvaro Herrera wrote:
> Anyway it seems to me maybe there is room for a new table storage
> parameter, say autovacuum_do_balance which means to participate in the
> balancing program or not.

If that eliminates some of the hairy edge cases, sure.

A useful concept to consider is having a soft limit that most thing work 
against, along with a total hard limit for the server.  When one of 
these tight schedule queries with !autovacuum_do_balance starts, they 
must run at their designed speed with no concern for anyone else.  Which 
means:

a) Their bandwidth gets pulled out of the regular, soft limit numbers 
until they're done.  Last time I had one of these jobs, once the big 
important boys were running, everyone else in the regular shared set 
were capped at vacuum_cost_limit=5 worth of work.  Just enough to keep 
up with system catalog things, and over the course of many hours process 
small tables.

b) If you try to submit multiple locked rate jobs at once, and the total 
goes over the hard limit, they have to just be aborted.  If the rush of 
users comes back at 8AM, and you can clean the table up by then if you 
give it 10MB/s, what you cannot do is let some other user decrease your 
rate such that you're unfinished at 8AM.  Then you'll have aggressive AV 
competing against the user load you were trying to prepare for.  It's 
better to just throw a serious error that forces someone to look at the 
hard limit budget and adjust the schedule instead.  The systems with 
this sort of problem are getting cleaned up every single day, almost 
continuously; missing a day is not bad as long as it's noted and fixed 
again before the next cleanup window.



pgsql-hackers by date:

Previous
From: Rajeev rastogi
Date:
Subject: Index scan optimization
Next
From: Gregory Smith
Date:
Subject: Re: Scaling shared buffer eviction