Re: autovacuum not prioritising for-wraparound tables - Mailing list pgsql-hackers

From Christopher Browne
Subject Re: autovacuum not prioritising for-wraparound tables
Date
Msg-id CAFNqd5Vwf2TWUHNFwUy25z6HnSOTJq59b60=zR+KsV+qWXAJ_g@mail.gmail.com
Whole thread Raw
In response to Re: autovacuum not prioritising for-wraparound tables  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: autovacuum not prioritising for-wraparound tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: autovacuum not prioritising for-wraparound tables  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Fri, Feb 1, 2013 at 4:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jan 31, 2013 at 3:18 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> My intention was to apply a Nasby correction to Browne Strength and call
>> the resulting function Browne' (Browne prime).  Does that sound better?
>
> /me rests head in hands.  I'm not halfway clever enough to hang with
> this crowd; I'm not even going to touch the puns in Chris' reply.

It's Friday... Fun needs to be had :-).

>> Now seriously, I did experiment a bit with this and it seems to behave
>> reasonably.  Of course, there might be problems with it, and I don't
>> oppose to changing the name.  "Vacuum strength" didn't sound so great,
>> so I picked the first term that came to mind.  It's not like picking
>> people's last names to name stuff is a completely new idea; that said,
>> it was sort of a joke.
>
> I don't think I really understand the origin of the formula, so
> perhaps if someone would try to characterize why it seems to behave
> reasonably that would be helpful (at least to me).
>
>> f(deadtuples, relpages, age) =
>>    deadtuples/relpages + e ^ (age*ln(relpages)/2^32)
>
> To maybe make that discussion go more quickly let me kvetch about a
> few things to kick things off:
>
> - Using deadtuples/relpages as part of the formula means that tables
> with smaller tuples (thus more tuples per page) will tend to get
> vacuumed before tables with larger tuples (thus less tuples per page).
>  I can't immediately see why that's a good thing.

That wasn't intentional, and may be somewhat unfortunate.

I picked values that I knew could be easily grabbed, and we don't
have an immediate tuples-per-page estimate on pg_class.  An
estimate should be available in pg_statistic; I'm not sure that the
bias from this hurts things badly.

> - It's probably important to have a formula where we can be sure that
> the wrap-around term will eventually dominate the dead-tuple term,
> with enough time to spare to make sure nothing really bad happens; on
> the other hand, it's also desirable to avoid the case where a table
> that has just crossed the threshold for wraparound vacuuming doesn't
> immediately shoot to the top of the list even if it isn't truly
> urgent.  It's unclear to me just from looking at this formula how well
> the second term meets those goals.

I think the second term *does* provide a way for wraparound to dominate;
splitting it apart a bit...

Consider...
      age * ln(relpages)
e^ ----------------------------------         2^32

The wraparound portion of this involves age/2^32...  In the beginning, the
numerator will be near zero, and denominator near 2 billion, so is roughly
1.  As age trends towards 2^32, the fraction (ignoring ln(relpages)) trends
towards 1, so that the longer we go without vacuuming, the more certain
that the fraction indicates a value near 1.  That *tends* to give you something
looking like e^1, or 2.71828+, ignoring the relpages part.

I threw in multiplying by ln(relpages) as a way to step Well Back from
rollover; that means that this term will start growing considerably before
rollover, and the larger the table, the sooner that growth takes place.

There is a problem with the ln(relpages) term; if the table has just 1 page,
the ln(relpages) = 0 so the value of the exponential term is *always* 1.
Probably should have ln(relpages+CONSTANT) so that we guarantee
the numerator is never 0.

I'm a bit worried that the exponential term might dominate *too* quickly.

For a table I have handy with 163K tuples, spread across 3357 pages,
ln(relpage) = 8.1188, and the range of the "exponential bit" travels
like follows:

dotpro0620@localhost->  select generate_series(1,20)*100/20 as
percent_wraparound, power(2.71828,
(65536.0*32768.0*generate_series(1,20)/20.0 *
ln(3357))/(65536.0*32768)) as wraparound_term;percent_wraparound | wraparound_term
--------------------+------------------                 5 | 1.50071232210687                10 |  2.2521374737234
        15 | 3.37981045789535                20 | 5.07212320054923                25 | 7.61179778630838
30| 11.4231187312988                35 | 17.1428150369499                40 | 25.7264337615498                45 |
38.607976149824               50 | 57.9394655396491                55 |  86.950469871638                60 |
130.48764154935               65 | 195.824411555774                70 | 293.876107391077                75 |
441.023495534592               80 | 661.849394087408                85 |  993.24554108594                90 |
1490.57582238538               95 | 2236.92550368832               100 | 3356.98166702019
 
(20 rows)

At the beginning, the "wraparound" portion is just 1.5, so easily dominated by
a table with a lot of dead tuples.  As the time to wraparound declines,
the term becomes steadily more urgent.  There may be constants
factors to fiddle with at the edges, but this term definitely heads towards
dominance.

That's definitely doing what I intended, and after constructing that table,
I think I'm *more* confident.  Consider that if there are two tables of
different sizes, both head towards "maxing out" at a "wraparound_term"
value that is directly correlated with the size of each table, which seems
mighty right.  A bigger table needs to get drawn into play (e.g. - needs
to get vacuumed) earlier than a smaller one.

> - More generally, it seems to me that we ought to be trying to think
> about the units in which these various quantities are measured.  Each
> term ought to be unit-less.  So perhaps the first term ought to divide
> dead tuples by total tuples, which has the nice property that the
> result is a dimensionless quantity that never exceeds 1.0.  Then the
> second term can be scaled somehow based on that value.

Absolutely a good idea.  I'm not sure I agree it ought to wind up
unitless; I'd instead expect a common unit, perhaps number of pages,
indicating a surrogate for quantity of I/O.  That we're both thinking
about "what's the unit?" means we're on a compatible trail.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"



pgsql-hackers by date:

Previous
From: Daniel Farina
Date:
Subject: Re: json api WIP patch
Next
From: Tom Lane
Date:
Subject: Re: proposal - assign result of query to psql variable