Thread: Make autovacuum sort tables in descending order of xid_age

Make autovacuum sort tables in descending order of xid_age

From
David Fetter
Date:
Folks,

Per a suggestion Christophe made, please find attached a patch to
$Subject:

Apart from carefully fudging with pg_resetwal, and short of running in
production for a few weeks, what would be some good ways to test this?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachment

Re: Make autovacuum sort tables in descending order of xid_age

From
David Fetter
Date:
On Fri, Nov 29, 2019 at 07:01:39PM +0100, David Fetter wrote:
> Folks,
> 
> Per a suggestion Christophe made, please find attached a patch to
> $Subject:
> 
> Apart from carefully fudging with pg_resetwal, and short of running in
> production for a few weeks, what would be some good ways to test this?

Per discussion on IRC with Sehrope Sarkuni, please find attached a
patch with one fewer bug, this one in the repalloc() calls.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachment

Re: Make autovacuum sort tables in descending order of xid_age

From
Mark Dilger
Date:

On 11/29/19 2:21 PM, David Fetter wrote:
> On Fri, Nov 29, 2019 at 07:01:39PM +0100, David Fetter wrote:
>> Folks,
>>
>> Per a suggestion Christophe made, please find attached a patch to
>> $Subject:
>>
>> Apart from carefully fudging with pg_resetwal, and short of running in
>> production for a few weeks, what would be some good ways to test this?
> 
> Per discussion on IRC with Sehrope Sarkuni, please find attached a
> patch with one fewer bug, this one in the repalloc() calls.

Hello David,

Here are my initial thoughts.

Although you appear to be tackling the problem of vacuuming tables
with older Xids first *per database*, have you considered changing
the logic in building and sorting the database list in get_database_list
and rebuild_database_list?  I'm just curious what your thoughts
might be on this subject.

As far as sorting the list of tables in an array and then copying
that array into a linked list, I think there is no need.  The
copying of table_ages into table_oids is followed immediately by

   foreach(cell, table_oids)

and then table_oids seems not to serve any further purpose.  Perhaps
you can just iterate over table_ages directly and avoid the extra
copying.

I have not tested this change, but I may do so later today or perhaps
on Monday.

-- 
Mark Dilger



Re: Make autovacuum sort tables in descending order of xid_age

From
David Fetter
Date:
On Sat, Nov 30, 2019 at 10:04:07AM -0800, Mark Dilger wrote:
> On 11/29/19 2:21 PM, David Fetter wrote:
> > On Fri, Nov 29, 2019 at 07:01:39PM +0100, David Fetter wrote:
> > > Folks,
> > > 
> > > Per a suggestion Christophe made, please find attached a patch to
> > > $Subject:
> > > 
> > > Apart from carefully fudging with pg_resetwal, and short of running in
> > > production for a few weeks, what would be some good ways to test this?
> > 
> > Per discussion on IRC with Sehrope Sarkuni, please find attached a
> > patch with one fewer bug, this one in the repalloc() calls.
> 
> Hello David,
> 
> Here are my initial thoughts.
> 
> Although you appear to be tackling the problem of vacuuming tables
> with older Xids first *per database*,

Yes, that's what's come up for me in production, but lately,
production has consisted of a single active DB maxing out hardware. I
can see how in other situations--multi-tenant, especially--it would
make more sense to sort the DBs first.

> have you considered changing the logic in building and sorting the
> database list in get_database_list and rebuild_database_list?  I'm
> just curious what your thoughts might be on this subject.

I hadn't, but now that you mention it, it seems like a reasonable
thing to try.

> As far as sorting the list of tables in an array and then copying
> that array into a linked list, I think there is no need.  The
> copying of table_ages into table_oids is followed immediately by
> 
>   foreach(cell, table_oids)
> 
> and then table_oids seems not to serve any further purpose.  Perhaps
> you can just iterate over table_ages directly and avoid the extra
> copying.

I hadn't looked toward any optimizations in this section, given that
the vacuums in question can take hours or days, but I can see how that
would make the code cleaner, so please find that change attached.

> I have not tested this change, but I may do so later today or perhaps
> on Monday.

Thanks for looking at this!

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachment

Re: Make autovacuum sort tables in descending order of xid_age

From
Mark Dilger
Date:

On 11/30/19 2:23 PM, David Fetter wrote:
> On Sat, Nov 30, 2019 at 10:04:07AM -0800, Mark Dilger wrote:
>> On 11/29/19 2:21 PM, David Fetter wrote:
>>> On Fri, Nov 29, 2019 at 07:01:39PM +0100, David Fetter wrote:
>>>> Folks,
>>>>
>>>> Per a suggestion Christophe made, please find attached a patch to
>>>> $Subject:
>>>>
>>>> Apart from carefully fudging with pg_resetwal, and short of running in
>>>> production for a few weeks, what would be some good ways to test this?
>>>
>>> Per discussion on IRC with Sehrope Sarkuni, please find attached a
>>> patch with one fewer bug, this one in the repalloc() calls.
>>
>> Hello David,
>>
>> Here are my initial thoughts.
>>
>> Although you appear to be tackling the problem of vacuuming tables
>> with older Xids first *per database*,
> 
> Yes, that's what's come up for me in production, but lately,
> production has consisted of a single active DB maxing out hardware. I
> can see how in other situations--multi-tenant, especially--it would
> make more sense to sort the DBs first.

I notice you don't address that in your latest patch.  Do you have
any thoughts on whether that needs to be handled in this patch?
Should tackling that problem be left for later?

>> have you considered changing the logic in building and sorting the
>> database list in get_database_list and rebuild_database_list?  I'm
>> just curious what your thoughts might be on this subject.
> 
> I hadn't, but now that you mention it, it seems like a reasonable
> thing to try.
> 
>> As far as sorting the list of tables in an array and then copying
>> that array into a linked list, I think there is no need.  The
>> copying of table_ages into table_oids is followed immediately by
>>
>>    foreach(cell, table_oids)
>>
>> and then table_oids seems not to serve any further purpose.  Perhaps
>> you can just iterate over table_ages directly and avoid the extra
>> copying.
> 
> I hadn't looked toward any optimizations in this section, given that
> the vacuums in question can take hours or days, but I can see how that
> would make the code cleaner, so please find that change attached.

That looks better, thanks!

>> I have not tested this change, but I may do so later today or perhaps
>> on Monday.

The code compiles cleanly and passes all regression tests, but I don't
think those tests really cover what you are changing.  Have you been
using any test framework for this?

I wonder if you might add information about table size, table changes,
and bloat to your RelFrozenXidAge struct and modify rfxa_comparator to
use a heuristic to cost the (age, size, bloat, changed) grouping and
sort on that cost, such that really large bloated tables with old xids
might get vacuumed before smaller, less bloated tables that have
even older xids. Sorting the tables based purely on xid_age seems to
ignore other factors that are worth considering.  I do not have a
formula for how those four factors should be weighted in the heuristic,
but you are implicitly assigning three of them a weight of zero in
your current patch.

relation_needs_vacanalyze currently checks the reltuples, n_dead_tuples
and changes_since_analyze along with vac_scale_factor and
anl_scale_factor for the relation, but only returns booleans dovacuum,
doanalyze, and wraparound.  If you pass your RelFrozenXidAge struct
(perhaps renamed) into relation_needs_vacanalyze, it could store those
values for the relation so that you don't need to look it up again when
sorting.

-- 
Mark Dilger



Re: Make autovacuum sort tables in descending order of xid_age

From
David Fetter
Date:
On Thu, Dec 12, 2019 at 08:02:25AM -0800, Mark Dilger wrote:
> On 11/30/19 2:23 PM, David Fetter wrote:
> > On Sat, Nov 30, 2019 at 10:04:07AM -0800, Mark Dilger wrote:
> > > On 11/29/19 2:21 PM, David Fetter wrote:
> > > > On Fri, Nov 29, 2019 at 07:01:39PM +0100, David Fetter wrote:
> > > > > Folks,
> > > > > 
> > > > > Per a suggestion Christophe made, please find attached a patch to
> > > > > $Subject:
> > > > > 
> > > > > Apart from carefully fudging with pg_resetwal, and short of running in
> > > > > production for a few weeks, what would be some good ways to test this?
> > > > 
> > > > Per discussion on IRC with Sehrope Sarkuni, please find attached a
> > > > patch with one fewer bug, this one in the repalloc() calls.
> > > 
> > > Hello David,
> > > 
> > > Here are my initial thoughts.
> > > 
> > > Although you appear to be tackling the problem of vacuuming tables
> > > with older Xids first *per database*,
> > 
> > Yes, that's what's come up for me in production, but lately,
> > production has consisted of a single active DB maxing out hardware. I
> > can see how in other situations--multi-tenant, especially--it would
> > make more sense to sort the DBs first.
> 
> I notice you don't address that in your latest patch.  Do you have
> any thoughts on whether that needs to be handled in this patch?

My thought is that it doesn't.

> > > I have not tested this change, but I may do so later today or perhaps
> > > on Monday.
> 
> The code compiles cleanly and passes all regression tests, but I don't
> think those tests really cover what you are changing.  Have you been
> using any test framework for this?

I don't have one :/

> I wonder if you might add information about table size, table changes,
> and bloat to your RelFrozenXidAge struct and modify rfxa_comparator to
> use a heuristic to cost the (age, size, bloat, changed) grouping and
> sort on that cost, such that really large bloated tables with old xids
> might get vacuumed before smaller, less bloated tables that have
> even older xids. Sorting the tables based purely on xid_age seems to
> ignore other factors that are worth considering.  I do not have a
> formula for how those four factors should be weighted in the heuristic,
> but you are implicitly assigning three of them a weight of zero in
> your current patch.

I think it's vastly premature to come up with complex sorting systems
right now.  Just sorting in descending order of age should either have
or not have positive effects.

> relation_needs_vacanalyze currently checks the reltuples, n_dead_tuples
> and changes_since_analyze along with vac_scale_factor and
> anl_scale_factor for the relation, but only returns booleans dovacuum,
> doanalyze, and wraparound.

Yeah, I looked at that. It's for a vastly different purpose, namely
deciding what's an emergency and what's probably not, but needs
attention anyhow.  My goal was something a little finer-grained and, I
hope, a little easier to establish the (lack of) benefits because only
one thing is getting changed.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Make autovacuum sort tables in descending order of xid_age

From
Mark Dilger
Date:

On 12/12/19 11:26 AM, David Fetter wrote:
> On Thu, Dec 12, 2019 at 08:02:25AM -0800, Mark Dilger wrote:
>> On 11/30/19 2:23 PM, David Fetter wrote:
>>> On Sat, Nov 30, 2019 at 10:04:07AM -0800, Mark Dilger wrote:
>>>> On 11/29/19 2:21 PM, David Fetter wrote:
>>>>> On Fri, Nov 29, 2019 at 07:01:39PM +0100, David Fetter wrote:
>>>>>> Folks,
>>>>>>
>>>>>> Per a suggestion Christophe made, please find attached a patch to
>>>>>> $Subject:
>>>>>>
>>>>>> Apart from carefully fudging with pg_resetwal, and short of running in
>>>>>> production for a few weeks, what would be some good ways to test this?
>>>>>
>>>>> Per discussion on IRC with Sehrope Sarkuni, please find attached a
>>>>> patch with one fewer bug, this one in the repalloc() calls.
>>>>
>>>> Hello David,
>>>>
>>>> Here are my initial thoughts.
>>>>
>>>> Although you appear to be tackling the problem of vacuuming tables
>>>> with older Xids first *per database*,
>>>
>>> Yes, that's what's come up for me in production, but lately,
>>> production has consisted of a single active DB maxing out hardware. I
>>> can see how in other situations--multi-tenant, especially--it would
>>> make more sense to sort the DBs first.
>>
>> I notice you don't address that in your latest patch.  Do you have
>> any thoughts on whether that needs to be handled in this patch?
> 
> My thought is that it doesn't.

I can live with that for now.  I'd like the design to be compatible with
revisiting that in a subsequent patch.

>>>> I have not tested this change, but I may do so later today or perhaps
>>>> on Monday.
>>
>> The code compiles cleanly and passes all regression tests, but I don't
>> think those tests really cover what you are changing.  Have you been
>> using any test framework for this?
> 
> I don't have one :/

We need to get that fixed.

>> I wonder if you might add information about table size, table changes,
>> and bloat to your RelFrozenXidAge struct and modify rfxa_comparator to
>> use a heuristic to cost the (age, size, bloat, changed) grouping and
>> sort on that cost, such that really large bloated tables with old xids
>> might get vacuumed before smaller, less bloated tables that have
>> even older xids. Sorting the tables based purely on xid_age seems to
>> ignore other factors that are worth considering.  I do not have a
>> formula for how those four factors should be weighted in the heuristic,
>> but you are implicitly assigning three of them a weight of zero in
>> your current patch.
> 
> I think it's vastly premature to come up with complex sorting systems
> right now.  Just sorting in descending order of age should either have
> or not have positive effects.

I hear what you are saying, but I'm going to argue the other side.

   Let C = 1.00000002065
   Let x = xid_age for a table
   Let v = clamp(n_dead_tuples / reltuples*2) to max 0.5
   Let a = clamp(changes_since_analyze / reltuples) to max 0.5

   Let score = C**x + v + a

With x = 1 million    => C**x = 1.02
      x = 200 million  => C**x = 62.2
      x = 2**32        => C**x = FLT_MAX - delta

The maximum contribution to the score that n_dead_tuples and
changes_since_analyze can make is 1.0.  Once the xid age reaches one
million, it will start to be the dominant factor.  By the time it
reaches the default value of 200 million for freeze_max_age it is
far and away the dominant factor, and the xid age of one table vs.
another never overflows FLT_MAX given that 2**32 is the largest
xid age your current system can store in the uint32 you are using.

The computed score is a 32 bit float, which takes no more memory
to store than the xid_age field you are storing.  So storing the
score rather than the xid age is memory-wise equivalent to your
patch.

I doubt the computation time for the exponential is relevant
compared to the n*log(n) average sorting time of the quicksort.
It is even less relevant compared to the time it takes to vacuum
the tables.  I doubt my proposal has a measurable run-time impact.

On the upside, if you have a database with autovacuum configured
aggressively, you can get the tables with the most need vacuumed
first, with need computed relative to vac_scale_factor and
anl_scale_factor, which helps for a different use case than yours.
The xid age problem might not exist for databases where autovacuum
has enough resources to never fall behind.  Those databases will
have other priorities for where autovacuum spends its time.

I'm imagining coming back with two patches later, one that does
something more about choosing which database to vacuum first, and
another that recomputes which table to vacuum next when a worker
finishes vacuuming a table.  These combined could help keep tables
that are sensitive to statistics changes vacuumed more frequently
than others.

>> relation_needs_vacanalyze currently checks the reltuples, n_dead_tuples
>> and changes_since_analyze along with vac_scale_factor and
>> anl_scale_factor for the relation, but only returns booleans dovacuum,
>> doanalyze, and wraparound.
> 
> Yeah, I looked at that. It's for a vastly different purpose, namely
> deciding what's an emergency and what's probably not, but needs
> attention anyhow.  My goal was something a little finer-grained and, I
> hope, a little easier to establish the (lack of) benefits because only
> one thing is getting changed.

That's all I'll say for now.  Hopefully other members of the
community will weigh in.

-- 
Mark Dilger



Re: Make autovacuum sort tables in descending order of xid_age

From
Mark Dilger
Date:

On 12/12/19 1:35 PM, Mark Dilger wrote:
>    Let C = 1.00000002065
>    Let x = xid_age for a table
>    Let v = clamp(n_dead_tuples / reltuples*2) to max 0.5
>    Let a = clamp(changes_since_analyze / reltuples) to max 0.5
> 
>    Let score = C**x + v + a

I should hasten to add that this is just a proof of concept
formula, not one that I'm specifically advocating.  The point
is that we can devise a scoring system in which the xid age
is the dominant factor whenever it could possibly matter,
while still letting other factors prevail when xid age is
of little consequence.

-- 
Mark Dilger



Re: Make autovacuum sort tables in descending order of xid_age

From
Mark Dilger
Date:

On 12/12/19 1:35 PM, Mark Dilger wrote:
> Once the xid age reaches one
> million, it will start to be the dominant factor.

Actually, it doesn't change much from x = 1 to x = 1,000,000
but I was planning to add another factor to the formula and
forgot before sending the email.  I'll leave that as an
exercise for the reader.

-- 
Mark Dilger



Re: Make autovacuum sort tables in descending order of xid_age

From
David Kimura
Date:
On Tue, Jan 7, 2020 at 12:47 PM David Fetter <david@fetter.org> wrote:
> Per a suggestion Christophe made, please find attached a patch to
> $Subject:

Curious, what's the benefit of autovacuum handling the oldest tables
first? If there is a related thread with the discussion, I couldn't find
it.

> Apart from carefully fudging with pg_resetwal, and short of running in
> production for a few weeks, what would be some good ways to test this?

Greenplum tests autovacuum using a fault injection framework, which
was once proposed by Asim [1] and a function to consume xids [2].

If that isn't an option, maybe you could acquire a vacuum blocking
lock on a table, for example by creating index on it inside a dangling
transaction. Then after autovacuum worker blocks, in a separate
session you could check that a previously older table is now younger.
Does that suffice?


Thanks,
David

[1] https://www.postgresql.org/message-id/CANXE4TdxdESX1jKw48xet-5GvBFVSq=4cgNeioTQff372KO45A@mail.gmail.com
[2]
https://github.com/greenplum-db/gpdb/blob/5feccaae6838e68b1443e46ed39d162613c5ece8/src/test/regress/regress_gp.c#L2003



Re: Make autovacuum sort tables in descending order of xid_age

From
Robert Haas
Date:
On Thu, Dec 12, 2019 at 2:26 PM David Fetter <david@fetter.org> wrote:
> > I wonder if you might add information about table size, table changes,
> > and bloat to your RelFrozenXidAge struct and modify rfxa_comparator to
> > use a heuristic to cost the (age, size, bloat, changed) grouping and
> > sort on that cost, such that really large bloated tables with old xids
> > might get vacuumed before smaller, less bloated tables that have
> > even older xids. Sorting the tables based purely on xid_age seems to
> > ignore other factors that are worth considering.  I do not have a
> > formula for how those four factors should be weighted in the heuristic,
> > but you are implicitly assigning three of them a weight of zero in
> > your current patch.
>
> I think it's vastly premature to come up with complex sorting systems
> right now.  Just sorting in descending order of age should either have
> or not have positive effects.

A lot of previous efforts to improve autovacuum scheduling have fallen
down precisely because they did something that was so simple that it
was doomed to regress as many cases as it improved, so I wouldn't be
too quick to dismiss Mark's suggestion. In general, sorting by XID age
seems like it should be better, but it's not hard to come up with a
counterexample: suppose table T1 is going to wrap around in 4 hours
and takes 4 hours to vacuum, but table T2 is going to wrap around in 2
hours and takes 1 hour to vacuum. Your algorithm will prioritize T2,
but it's better to prioritize T1. A second autovacuum worker may
become available for this database later and still get T2 done before
we run into trouble, but if we don't start T1 right now, we're hosed.
The current algorithm gets this right if T1 was defined before T2 and
thus appears earlier in pg_class; your algorithm gets it wrong
regardless.

I've had the thought for a while now that perhaps we ought to try to
estimate the rate of XID consumption, because without that it's really
hard to make smart decisions. In the above example, if the rate of XID
consumption is 4x slower, then it might be smarter to vacuum T2 first,
especially if T2 is very heavily updated compared to T1 and might
bloat if we don't deal with it right away. At the lower rate of XID
consumption, T1 is an urgent problem, but not yet an emergency.
However, I've noticed that most people who complain about unexpected
wraparound vacuums have them hit in peak periods, which when you think
about it, makes a lot of sense. If you consume XIDs 10x as fast during
your busy time as your non-busy times, then the XID that triggers the
wraparound scan on any given table is very likely to occur during a
busy period. So the *current* rate of XID consumption might not be
very informative, which makes figuring out what to do here awfully
tricky.

I think Mark's suggestion of some kind of formula that takes into
account the XID age as well as table size and bloat is probably a
pretty good one. We'll probably need to make some of the parameters of
that formula configurable.  Ideally, they'll be easy enough to
understand that users can say "oh, I'm using XIDs more or less quickly
than normal here, so I need to change parameter X" and even figure out
-- without using a calculator -- what sort of value for X might be
appropriate.

When there's a replication slot or prepared transaction or open
transaction holding back xmin, you can't advance the relfrozenxid of
that table past that point no matter how aggressively you vacuum it,
so it would probably be a good idea to set up the formula so that the
weight is based on the amount by which we think we'll be able to
advance relfrozenxid rather than, say, the age relative to the last
XID assigned.

The dominant cost of vacuuming a table is often the number and size of
the indexes rather than the size of the heap, particularly because the
visibility map may permit skipping a lot of the heap. So you have N
indexes that need to be read completely and 1 heap that needs to be
read only partially. So, whatever portion of the score comes from
estimating the cost of vacuuming that table ought to factor in the
size of the indexes. Perhaps it should also consider the contents of
the visibility map, although I'm less sure about that.

One problem with the exponential in Mark's formula is that it might
treat small XID differences between old tables as more important than
they really are.  I wonder if it might be a better idea to compute
several different quantities and use the maximum from among them as
the prioritization. We can model the priority of vacuuming a
particular table as the benefit of vacuuming that table multiplied by
the effort. The effort is easy to model: just take the size of the
table and its indexes. The benefit is trickier, because there are four
different possible benefits: relfrozenxid advancement, relminmxid
advancement, dead tuple removal, and marking pages all-visible. So,
suppose we model each benefit by a separate equation. For XID
advancement, figure figure out the difference between relfrozenxid and
RecentGlobalXmin; if it's less than vacuum_freeze_min_age, then 0;
else multiply the amount in excess of vacuum_freeze_min_age by some
constant. Analogously for MXID advancement. For bloat, the number of
dead tuples multiplied by some other constant, presumably smaller. For
marking pages all-visible, if we want to factor that in, the number of
pages that are not currently all-visible multiplied by the smallest
constant of all. Take the highest of those benefits and multiple by
the size of the table and its indexes to find the priority.

Whatever formula we use exactly, we want XID-age to be the dominant
consideration for tables that are in real wraparound danger, but, I
think, not to the complete exclusion of table size and bloat
considerations. There is certainly a point at which a table is so near
wraparound that it needs to take precedence over tables that are just
being vacuumed for bloat, but you don't want that to happen
unnecessarily, because bloat is *really* bad. And you don't
necessarily just have one table in wraparound danger; if there are
multiples, you want to choose between them intelligently, and the fact
that relfrozenxid differs by 1 shouldn't dominate a 2x difference in
the on-disk size.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Make autovacuum sort tables in descending order of xid_age

From
David Kimura
Date:
On Thu, Jan 9, 2020 at 12:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Dec 12, 2019 at 2:26 PM David Fetter <david@fetter.org> wrote:
> > > I wonder if you might add information about table size, table changes,
> > > and bloat to your RelFrozenXidAge struct and modify rfxa_comparator to
> > > use a heuristic to cost the (age, size, bloat, changed) grouping and
> > > sort on that cost, such that really large bloated tables with old xids
> > > might get vacuumed before smaller, less bloated tables that have
> > > even older xids. Sorting the tables based purely on xid_age seems to
> > > ignore other factors that are worth considering.  I do not have a
> > > formula for how those four factors should be weighted in the heuristic,
> > > but you are implicitly assigning three of them a weight of zero in
> > > your current patch.
> >
> > I think it's vastly premature to come up with complex sorting systems
> > right now.  Just sorting in descending order of age should either have
> > or not have positive effects.
>
> A lot of previous efforts to improve autovacuum scheduling have fallen
> down precisely because they did something that was so simple that it
> was doomed to regress as many cases as it improved, so I wouldn't be
> too quick to dismiss Mark's suggestion. In general, sorting by XID age
> seems like it should be better, but it's not hard to come up with a
> counterexample: suppose table T1 is going to wrap around in 4 hours
> and takes 4 hours to vacuum, but table T2 is going to wrap around in 2
> hours and takes 1 hour to vacuum.

Ah, so primary purpose of this patch is to add smarts when autovacuum
is triggered to handle wrap around?

> I've had the thought for a while now that perhaps we ought to try to
> estimate the rate of XID consumption, because without that it's really
> hard to make smart decisions.

Very interesting.

Could there be value in making this feature more preventative, perhaps
by triggering emergency autovacuum earlier based on some combination
of these heuristics rather than autovacuum_freeze_max_age alone?

Thanks,
David



Re: Make autovacuum sort tables in descending order of xid_age

From
David Fetter
Date:
On Thu, Jan 09, 2020 at 12:23:46PM -0500, Robert Haas wrote:
> On Thu, Dec 12, 2019 at 2:26 PM David Fetter <david@fetter.org> wrote:
> > > I wonder if you might add information about table size, table changes,
> > > and bloat to your RelFrozenXidAge struct and modify rfxa_comparator to
> > > use a heuristic to cost the (age, size, bloat, changed) grouping and
> > > sort on that cost, such that really large bloated tables with old xids
> > > might get vacuumed before smaller, less bloated tables that have
> > > even older xids. Sorting the tables based purely on xid_age seems to
> > > ignore other factors that are worth considering.  I do not have a
> > > formula for how those four factors should be weighted in the heuristic,
> > > but you are implicitly assigning three of them a weight of zero in
> > > your current patch.
> >
> > I think it's vastly premature to come up with complex sorting systems
> > right now.  Just sorting in descending order of age should either have
> > or not have positive effects.
> 
> A lot of previous efforts to improve autovacuum scheduling have fallen
> down precisely because they did something that was so simple that it
> was doomed to regress as many cases as it improved, so I wouldn't be
> too quick to dismiss Mark's suggestion. In general, sorting by XID age
> seems like it should be better, but it's not hard to come up with a
> counterexample: suppose table T1 is going to wrap around in 4 hours
> and takes 4 hours to vacuum, but table T2 is going to wrap around in 2
> hours and takes 1 hour to vacuum. Your algorithm will prioritize T2,
> but it's better to prioritize T1. A second autovacuum worker may
> become available for this database later and still get T2 done before
> we run into trouble, but if we don't start T1 right now, we're hosed.
> The current algorithm gets this right if T1 was defined before T2 and
> thus appears earlier in pg_class; your algorithm gets it wrong
> regardless.

Does it get it more wrong than the current system where there's
essentially no attempt to set priorities? If so, how?

> I've had the thought for a while now that perhaps we ought to try to
> estimate the rate of XID consumption, because without that it's really
> hard to make smart decisions. In the above example, if the rate of XID
> consumption is 4x slower, then it might be smarter to vacuum T2 first,
> especially if T2 is very heavily updated compared to T1 and might
> bloat if we don't deal with it right away. At the lower rate of XID
> consumption, T1 is an urgent problem, but not yet an emergency.
> However, I've noticed that most people who complain about unexpected
> wraparound vacuums have them hit in peak periods, which when you think
> about it, makes a lot of sense. If you consume XIDs 10x as fast during
> your busy time as your non-busy times, then the XID that triggers the
> wraparound scan on any given table is very likely to occur during a
> busy period. So the *current* rate of XID consumption might not be
> very informative, which makes figuring out what to do here awfully
> tricky.
> 
> I think Mark's suggestion of some kind of formula that takes into
> account the XID age as well as table size and bloat is probably a
> pretty good one. We'll probably need to make some of the parameters of
> that formula configurable.  Ideally, they'll be easy enough to
> understand that users can say "oh, I'm using XIDs more or less quickly
> than normal here, so I need to change parameter X" and even figure out
> -- without using a calculator -- what sort of value for X might be
> appropriate.
> 
> When there's a replication slot or prepared transaction or open
> transaction holding back xmin, you can't advance the relfrozenxid of
> that table past that point no matter how aggressively you vacuum it,
> so it would probably be a good idea to set up the formula so that the
> weight is based on the amount by which we think we'll be able to
> advance relfrozenxid rather than, say, the age relative to the last
> XID assigned.
> 
> The dominant cost of vacuuming a table is often the number and size of
> the indexes rather than the size of the heap, particularly because the
> visibility map may permit skipping a lot of the heap. So you have N
> indexes that need to be read completely and 1 heap that needs to be
> read only partially. So, whatever portion of the score comes from
> estimating the cost of vacuuming that table ought to factor in the
> size of the indexes. Perhaps it should also consider the contents of
> the visibility map, although I'm less sure about that.
> 
> One problem with the exponential in Mark's formula is that it might
> treat small XID differences between old tables as more important than
> they really are.  I wonder if it might be a better idea to compute
> several different quantities and use the maximum from among them as
> the prioritization. We can model the priority of vacuuming a
> particular table as the benefit of vacuuming that table multiplied by
> the effort. The effort is easy to model: just take the size of the
> table and its indexes. The benefit is trickier, because there are four
> different possible benefits: relfrozenxid advancement, relminmxid
> advancement, dead tuple removal, and marking pages all-visible. So,
> suppose we model each benefit by a separate equation. For XID
> advancement, figure figure out the difference between relfrozenxid and
> RecentGlobalXmin; if it's less than vacuum_freeze_min_age, then 0;
> else multiply the amount in excess of vacuum_freeze_min_age by some
> constant. Analogously for MXID advancement. For bloat, the number of
> dead tuples multiplied by some other constant, presumably smaller. For
> marking pages all-visible, if we want to factor that in, the number of
> pages that are not currently all-visible multiplied by the smallest
> constant of all. Take the highest of those benefits and multiple by
> the size of the table and its indexes to find the priority.

This is all sounding like really important work into the future.

> Whatever formula we use exactly, we want XID-age to be the dominant
> consideration for tables that are in real wraparound danger,

...which is what this patch does.

> but, I think, not to the complete exclusion of table size and bloat
> considerations. There is certainly a point at which a table is so
> near wraparound that it needs to take precedence over tables that
> are just being vacuumed for bloat, but you don't want that to happen
> unnecessarily, because bloat is *really* bad. And you don't
> necessarily just have one table in wraparound danger; if there are
> multiples, you want to choose between them intelligently, and the
> fact that relfrozenxid differs by 1 shouldn't dominate a 2x
> difference in the on-disk size.

I agree that it's a complex situation, and that many different
approaches will eventually need to be brought to bear.

What concerns me about introducing a big lump of complexity here is
disentangling the effects of each part and of their interaction terms.
We're not, to put it mildly, set up to do ANOVA
(https://en.wikipedia.org/wiki/Analysis_of_variance ) , ANCOVA (
https://en.wikipedia.org/wiki/Analysis_of_covariance ), etc. on
changes.

Given the above, I'd like to make the case for changing just this one
thing at first and seeing whether the difference it makes is generally
positive.

Future patches could build on those results.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Make autovacuum sort tables in descending order of xid_age

From
David Steele
Date:
On 1/11/20 12:53 PM, David Fetter wrote:
> 
> I agree that it's a complex situation, and that many different
> approaches will eventually need to be brought to bear.
> 
> What concerns me about introducing a big lump of complexity here is
> disentangling the effects of each part and of their interaction terms.
> We're not, to put it mildly, set up to do ANOVA
> (https://en.wikipedia.org/wiki/Analysis_of_variance ) , ANCOVA (
> https://en.wikipedia.org/wiki/Analysis_of_covariance ), etc. on
> changes.
> 
> Given the above, I'd like to make the case for changing just this one
> thing at first and seeing whether the difference it makes is generally
> positive.

Mark, Robert, thoughts on this?

-- 
-David
david@pgmasters.net



Re: Make autovacuum sort tables in descending order of xid_age

From
Mark Dilger
Date:

> On Mar 30, 2020, at 7:09 AM, David Steele <david@pgmasters.net> wrote:
>
> On 1/11/20 12:53 PM, David Fetter wrote:
>> I agree that it's a complex situation, and that many different
>> approaches will eventually need to be brought to bear.
>> What concerns me about introducing a big lump of complexity here is
>> disentangling the effects of each part and of their interaction terms.
>> We're not, to put it mildly, set up to do ANOVA
>> (https://en.wikipedia.org/wiki/Analysis_of_variance ) , ANCOVA (
>> https://en.wikipedia.org/wiki/Analysis_of_covariance ), etc. on
>> changes.
>> Given the above, I'd like to make the case for changing just this one
>> thing at first and seeing whether the difference it makes is generally
>> positive.
>
> Mark, Robert, thoughts on this?

I have not been working on this issue lately, but as I recall, my concern was that changing the behavior of autovacuum
couldintroduce regressions for some users, so we should be careful to get it right before we rush to release anything.
Itdidn't seem like the proposed changes took enough into account.  But that's clearly a judgement call, having to do
withhow cautious any particular person thinks we should be.  I don't feel strongly enough to stand in the way if the
generalconcensus is that this is a good enough implementation. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Make autovacuum sort tables in descending order of xid_age

From
Michael Paquier
Date:
On Mon, Mar 30, 2020 at 09:20:15AM -0700, Mark Dilger wrote:
> I have not been working on this issue lately, but as I recall, my
> concern was that changing the behavior of autovacuum could introduce
> regressions for some users, so we should be careful to get it right
> before we rush to release anything.  It didn't seem like the
> proposed changes took enough into account.  But that's clearly a
> judgement call, having to do with how cautious any particular person
> thinks we should be.  I don't feel strongly enough to stand in the
> way if the general concensus is that this is a good enough
> implementation.

Echoing with what has been already mentioned on this thread, I think
that autovacuum scheduling is a hard problem, and I would be rather
scared to change by default a behavior that has proved to work in some
cases, but could potentially doom others.  I have an idea though: we
could make the scheduling behavior of autovacuum optional.

Anyway, the thread has stalled for a couple of months now, and we
don't have a clear consensus about this approach, so I am marking this
thread as returned with feedback.
--
Michael

Attachment