Re: Stats for inheritance trees - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Stats for inheritance trees
Date
Msg-id 4823.1262132964@sss.pgh.pa.us
Whole thread Raw
In response to Stats for inheritance trees  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Stats for inheritance trees
List pgsql-hackers
I wrote:
> 3. Ideally autovacuum would know enough to perform ANALYZEs on
> inheritance parents after enough churn has occurred in their child
> table(s).  I am not entirely clear about a good way to do that.
> We could have it just directly force an ANALYZE on parent(s) of any
> table it has chosen to ANALYZE, but that might be overkill --- in
> particular leading to excess ANALYZEs when several children receive
> a lot of updates.

I've been looking at this for a bit, and I think the only reasonable
way to do it is to make the pgstats mechanism track the need for
ANALYZE on a parent table.  A hack like I suggested above would make
the autovacuum.c code even messier than it already is, and it seems
inevitable that we'd get duplicate analyze actions from different
autovac workers.

Now, I don't really want to add Yet Another per-table counter to pgstats
for this.  The stats are big enough already.  However, the existing
mechanism for triggering ANALYZE looks pretty bogus to me as I look at
it now: there's a last_anl_tuples value with a very hazy definition,
and what's worse it's being computed off numbers that may be only crude
estimates from ANALYZE.  What I propose doing is to replace that counter
with a "changes_since_analyze" counter, which can be managed very
simply:

* when a tabstat message comes in, increment changes_since_analyze by
the sum of t_tuples_inserted + t_tuples_updated + t_tuples_deleted;

* when an analyze report message comes in, reset changes_since_analyze
to zero.

This gives us a number that is actually pretty credible and can still
be compared to the analyze threshold the same as before.  I think the
current definition dates from before we had accurate
insert/delete/update tracking, but now that we have that, we should
use it.

Now, having done that, what I would suggest doing is having autovacuum
propagate the changes_since_analyze count that it sees up to the parent
table(s) whenever it does an autoanalyze.  (This would require adding a
new message type that allows reporting a changes_since_analyze increment
independently of inserted/updated/deleted, or else adding
changes_since_analyze as an independent field in regular tabstat
messages.)

In most cases, with the parent table probably smaller than the child
tables, this would immediately make the parent a candidate for analyze.
That might be overkill, in which case we could try multiplying the count
by some sort of derating factor, but getting hold of a good derating
factor might be more expensive than it's worth --- I think you'd have to
look at all the other children of the same parent to see how big the
current one is compared to the rest.

Thoughts?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Hot Standy introduced problem with query cancel behavior
Next
From: Robert Haas
Date:
Subject: Re: Stats for inheritance trees