Re: We need to log aborted autovacuums - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: We need to log aborted autovacuums |
Date | |
Msg-id | 4D2762C4.4050405@agliodbs.com Whole thread Raw |
In response to | Re: We need to log aborted autovacuums (Greg Smith <greg@2ndquadrant.com>) |
Responses |
Re: We need to log aborted autovacuums
Re: We need to log aborted autovacuums |
List | pgsql-hackers |
Greg, > It's already possible to detect the main symptom--dead row percentage is > much higher than the autovacuum threshold, but there's been no recent > autovacuum. That makes me less enthusiastic that there's such a genuine > need to justify the overhead of storing more table stats just to detect > the same thing a little more easily. The problem is that while this gives you the symptoms, it doesn't give you the cause. The lack of vacuum could be occurring for any of 4 reasons: 1) Locking 2) You have a lot of tables and not enough autovac_workers / too much sleep time 3) You need to autovac this particular table more frequently, since it gets dirtied really fast 4) The table has been set with special autovac settings which keep it from being autovac'd We can currently distinguish between cased 2, 3, 4 based on existing available facts. However, distinguishing case 1 from 2 or 3, in particular, isn't currently possible except by methods which require collecting a lot of ad-hoc monitoring data over a period of time. This makes the effort required for the diagnosis completely out of proportion with the magnitude of the problem. It occurs to me that another way of diagnosis would simply be a way to cause the autovac daemon to spit out output we could camp on, *without* requiring the huge volumes of output also required for DEBUG3. This brings us back to the logging idea again. > We could argue both sides of the trade-off of tracking this directly in > stats for some time, and I'd never expect there to be a clear victory > for either perspective. I've run into this vacuum problem a few times, > but certainly less than I've run into "why is the stats table so huge?" I really don't think that argument applies to either patch; last_autovac_attempt *or* the last_stats_reset time, since neither event is expected to occur frequently. If you have last_autovac_attempt (for example) being updated frequently, you clearly had a db problem bigger than the size of the stats table. Given that our road ahead necessarily includes adding more and more monitoring and admin data to PostgreSQL (because our DBA users demand it), I think that we should give some thought to the issue of storing DBA stats in general. By DBA stats I mean statistics which aren't used in query planning, but are used in monitoring, trending, and troubleshooting. I'm thinking these ought to have their own relation or relations. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
pgsql-hackers by date: