Re: Stats for inheritance trees - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Stats for inheritance trees |
Date | |
Msg-id | 4823.1262132964@sss.pgh.pa.us Whole thread Raw |
In response to | Stats for inheritance trees (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Stats for inheritance trees
|
List | pgsql-hackers |
I wrote: > 3. Ideally autovacuum would know enough to perform ANALYZEs on > inheritance parents after enough churn has occurred in their child > table(s). I am not entirely clear about a good way to do that. > We could have it just directly force an ANALYZE on parent(s) of any > table it has chosen to ANALYZE, but that might be overkill --- in > particular leading to excess ANALYZEs when several children receive > a lot of updates. I've been looking at this for a bit, and I think the only reasonable way to do it is to make the pgstats mechanism track the need for ANALYZE on a parent table. A hack like I suggested above would make the autovacuum.c code even messier than it already is, and it seems inevitable that we'd get duplicate analyze actions from different autovac workers. Now, I don't really want to add Yet Another per-table counter to pgstats for this. The stats are big enough already. However, the existing mechanism for triggering ANALYZE looks pretty bogus to me as I look at it now: there's a last_anl_tuples value with a very hazy definition, and what's worse it's being computed off numbers that may be only crude estimates from ANALYZE. What I propose doing is to replace that counter with a "changes_since_analyze" counter, which can be managed very simply: * when a tabstat message comes in, increment changes_since_analyze by the sum of t_tuples_inserted + t_tuples_updated + t_tuples_deleted; * when an analyze report message comes in, reset changes_since_analyze to zero. This gives us a number that is actually pretty credible and can still be compared to the analyze threshold the same as before. I think the current definition dates from before we had accurate insert/delete/update tracking, but now that we have that, we should use it. Now, having done that, what I would suggest doing is having autovacuum propagate the changes_since_analyze count that it sees up to the parent table(s) whenever it does an autoanalyze. (This would require adding a new message type that allows reporting a changes_since_analyze increment independently of inserted/updated/deleted, or else adding changes_since_analyze as an independent field in regular tabstat messages.) In most cases, with the parent table probably smaller than the child tables, this would immediately make the parent a candidate for analyze. That might be overkill, in which case we could try multiplying the count by some sort of derating factor, but getting hold of a good derating factor might be more expensive than it's worth --- I think you'd have to look at all the other children of the same parent to see how big the current one is compared to the rest. Thoughts? regards, tom lane
pgsql-hackers by date: