Re: We need to log aborted autovacuums - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: We need to log aborted autovacuums
Date
Msg-id 4D2762C4.4050405@agliodbs.com
Whole thread Raw
In response to Re: We need to log aborted autovacuums  (Greg Smith <greg@2ndquadrant.com>)
Responses Re: We need to log aborted autovacuums  (Greg Smith <greg@2ndquadrant.com>)
Re: We need to log aborted autovacuums  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-hackers
Greg,

> It's already possible to detect the main symptom--dead row percentage is
> much higher than the autovacuum threshold, but there's been no recent
> autovacuum.  That makes me less enthusiastic that there's such a genuine
> need to justify the overhead of storing more table stats just to detect
> the same thing a little more easily.

The problem is that while this gives you the symptoms, it doesn't give
you the cause.  The lack of vacuum could be occurring for any of 4 reasons:

1) Locking
2) You have a lot of tables and not enough autovac_workers / too much
sleep time
3) You need to autovac this particular table more frequently, since it
gets dirtied really fast
4) The table has been set with special autovac settings which keep it
from being autovac'd

We can currently distinguish between cased 2, 3, 4 based on existing
available facts.  However, distinguishing case 1 from 2 or 3, in
particular, isn't currently possible except by methods which require
collecting a lot of ad-hoc monitoring data over a period of time.  This
makes the effort required for the diagnosis completely out of proportion
with the magnitude of the problem.

It occurs to me that another way of diagnosis would simply be a way to
cause the autovac daemon to spit out output we could camp on, *without*
requiring the huge volumes of output also required for DEBUG3.  This
brings us back to the logging idea again.

> We could argue both sides of the trade-off of tracking this directly in
> stats for some time, and I'd never expect there to be a clear victory
> for either perspective.  I've run into this vacuum problem a few times,
> but certainly less than I've run into "why is the stats table so huge?"

I really don't think that argument applies to either patch;
last_autovac_attempt *or* the last_stats_reset time, since neither event
is expected to occur frequently.  If you have last_autovac_attempt (for
example) being updated frequently, you clearly had a db problem bigger
than the size of the stats table.

Given that our road ahead necessarily includes adding more and more
monitoring and admin data to PostgreSQL (because our DBA users demand
it), I think that we should give some thought to the issue of storing
DBA stats in general.  By DBA stats I mean statistics which aren't used
in query planning, but are used in monitoring, trending, and
troubleshooting.  I'm thinking these ought to have their own relation or
relations.


--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: system views for walsender activity
Next
From: Josh Berkus
Date:
Subject: Re: making an unlogged table logged