Re: Make autovacuum sort tables in descending order of xid_age - Mailing list pgsql-hackers

From David Fetter
Subject Re: Make autovacuum sort tables in descending order of xid_age
Date
Msg-id 20200111175335.GB32763@fetter.org
Whole thread Raw
In response to Re: Make autovacuum sort tables in descending order of xid_age  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Make autovacuum sort tables in descending order of xid_age
List pgsql-hackers
On Thu, Jan 09, 2020 at 12:23:46PM -0500, Robert Haas wrote:
> On Thu, Dec 12, 2019 at 2:26 PM David Fetter <david@fetter.org> wrote:
> > > I wonder if you might add information about table size, table changes,
> > > and bloat to your RelFrozenXidAge struct and modify rfxa_comparator to
> > > use a heuristic to cost the (age, size, bloat, changed) grouping and
> > > sort on that cost, such that really large bloated tables with old xids
> > > might get vacuumed before smaller, less bloated tables that have
> > > even older xids. Sorting the tables based purely on xid_age seems to
> > > ignore other factors that are worth considering.  I do not have a
> > > formula for how those four factors should be weighted in the heuristic,
> > > but you are implicitly assigning three of them a weight of zero in
> > > your current patch.
> >
> > I think it's vastly premature to come up with complex sorting systems
> > right now.  Just sorting in descending order of age should either have
> > or not have positive effects.
> 
> A lot of previous efforts to improve autovacuum scheduling have fallen
> down precisely because they did something that was so simple that it
> was doomed to regress as many cases as it improved, so I wouldn't be
> too quick to dismiss Mark's suggestion. In general, sorting by XID age
> seems like it should be better, but it's not hard to come up with a
> counterexample: suppose table T1 is going to wrap around in 4 hours
> and takes 4 hours to vacuum, but table T2 is going to wrap around in 2
> hours and takes 1 hour to vacuum. Your algorithm will prioritize T2,
> but it's better to prioritize T1. A second autovacuum worker may
> become available for this database later and still get T2 done before
> we run into trouble, but if we don't start T1 right now, we're hosed.
> The current algorithm gets this right if T1 was defined before T2 and
> thus appears earlier in pg_class; your algorithm gets it wrong
> regardless.

Does it get it more wrong than the current system where there's
essentially no attempt to set priorities? If so, how?

> I've had the thought for a while now that perhaps we ought to try to
> estimate the rate of XID consumption, because without that it's really
> hard to make smart decisions. In the above example, if the rate of XID
> consumption is 4x slower, then it might be smarter to vacuum T2 first,
> especially if T2 is very heavily updated compared to T1 and might
> bloat if we don't deal with it right away. At the lower rate of XID
> consumption, T1 is an urgent problem, but not yet an emergency.
> However, I've noticed that most people who complain about unexpected
> wraparound vacuums have them hit in peak periods, which when you think
> about it, makes a lot of sense. If you consume XIDs 10x as fast during
> your busy time as your non-busy times, then the XID that triggers the
> wraparound scan on any given table is very likely to occur during a
> busy period. So the *current* rate of XID consumption might not be
> very informative, which makes figuring out what to do here awfully
> tricky.
> 
> I think Mark's suggestion of some kind of formula that takes into
> account the XID age as well as table size and bloat is probably a
> pretty good one. We'll probably need to make some of the parameters of
> that formula configurable.  Ideally, they'll be easy enough to
> understand that users can say "oh, I'm using XIDs more or less quickly
> than normal here, so I need to change parameter X" and even figure out
> -- without using a calculator -- what sort of value for X might be
> appropriate.
> 
> When there's a replication slot or prepared transaction or open
> transaction holding back xmin, you can't advance the relfrozenxid of
> that table past that point no matter how aggressively you vacuum it,
> so it would probably be a good idea to set up the formula so that the
> weight is based on the amount by which we think we'll be able to
> advance relfrozenxid rather than, say, the age relative to the last
> XID assigned.
> 
> The dominant cost of vacuuming a table is often the number and size of
> the indexes rather than the size of the heap, particularly because the
> visibility map may permit skipping a lot of the heap. So you have N
> indexes that need to be read completely and 1 heap that needs to be
> read only partially. So, whatever portion of the score comes from
> estimating the cost of vacuuming that table ought to factor in the
> size of the indexes. Perhaps it should also consider the contents of
> the visibility map, although I'm less sure about that.
> 
> One problem with the exponential in Mark's formula is that it might
> treat small XID differences between old tables as more important than
> they really are.  I wonder if it might be a better idea to compute
> several different quantities and use the maximum from among them as
> the prioritization. We can model the priority of vacuuming a
> particular table as the benefit of vacuuming that table multiplied by
> the effort. The effort is easy to model: just take the size of the
> table and its indexes. The benefit is trickier, because there are four
> different possible benefits: relfrozenxid advancement, relminmxid
> advancement, dead tuple removal, and marking pages all-visible. So,
> suppose we model each benefit by a separate equation. For XID
> advancement, figure figure out the difference between relfrozenxid and
> RecentGlobalXmin; if it's less than vacuum_freeze_min_age, then 0;
> else multiply the amount in excess of vacuum_freeze_min_age by some
> constant. Analogously for MXID advancement. For bloat, the number of
> dead tuples multiplied by some other constant, presumably smaller. For
> marking pages all-visible, if we want to factor that in, the number of
> pages that are not currently all-visible multiplied by the smallest
> constant of all. Take the highest of those benefits and multiple by
> the size of the table and its indexes to find the priority.

This is all sounding like really important work into the future.

> Whatever formula we use exactly, we want XID-age to be the dominant
> consideration for tables that are in real wraparound danger,

...which is what this patch does.

> but, I think, not to the complete exclusion of table size and bloat
> considerations. There is certainly a point at which a table is so
> near wraparound that it needs to take precedence over tables that
> are just being vacuumed for bloat, but you don't want that to happen
> unnecessarily, because bloat is *really* bad. And you don't
> necessarily just have one table in wraparound danger; if there are
> multiples, you want to choose between them intelligently, and the
> fact that relfrozenxid differs by 1 shouldn't dominate a 2x
> difference in the on-disk size.

I agree that it's a complex situation, and that many different
approaches will eventually need to be brought to bear.

What concerns me about introducing a big lump of complexity here is
disentangling the effects of each part and of their interaction terms.
We're not, to put it mildly, set up to do ANOVA
(https://en.wikipedia.org/wiki/Analysis_of_variance ) , ANCOVA (
https://en.wikipedia.org/wiki/Analysis_of_covariance ), etc. on
changes.

Given the above, I'd like to make the case for changing just this one
thing at first and seeing whether the difference it makes is generally
positive.

Future patches could build on those results.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: pg_basebackup fails on databases with high OIDs
Next
From: Jeff Janes
Date:
Subject: Why is pq_begintypsend so slow?