Re: Track skipped tables during autovacuum and autoanalyze - Mailing list pgsql-hackers
| From | Yugo Nagata |
|---|---|
| Subject | Re: Track skipped tables during autovacuum and autoanalyze |
| Date | |
| Msg-id | 20260325012847.e026ba1860c07288efe3e97d@sraoss.co.jp Whole thread Raw |
| In response to | Re: Track skipped tables during autovacuum and autoanalyze (Sami Imseih <samimseih@gmail.com>) |
| Responses |
Re: Track skipped tables during autovacuum and autoanalyze
|
| List | pgsql-hackers |
On Tue, 24 Mar 2026 09:58:48 -0500 Sami Imseih <samimseih@gmail.com> wrote: Thank you for your comments! > > The attached patch add the following fields to pg_stat_all_tables: > > - last_skipped_autovacuum > > - last_skipped_autoanalyze > > - skipped_autovacuum_count > > - skipped_autoanalyze_count > > > > Are there any concerns about exposing this in pg_stat_all_tables, or suggestions > > for a better approach? > > I am not sure about the timestamp columns. I am not saying they will > not be useful, > but I think it will be better to just start with counters for this. > The way the views get > used, a dashboard built for tracking the deltas of the counters can easily spot > when there is a spike of skipped autovacuum/autoanalyze count. > Also, for tables that are being autovacuumed and skipped quickly, > the timestamps will just be overwritten. > > So, I am +1 on the counters, -1 on the timestamps. Although the timestamps are overwritten on each skipped autovacuum or autoanalyze, they still indicate when the last attempt was made. This can help users confirm that autovacuum is actively attempting to run, and that the issue is due to repeated skips rather than inactivity. While counters can indicate overall activity, they do not reveal when the last skip occurred. With timestamps, users can immediately see the most recent attempt, even without a separate dashboard or historical tracking. Therefore, counters are useful for monitoring overall activity, but timestamps give additional, complementary information, so it seems worthwhile to include them too. > Out of scope for this patch, but I also wonder if we should add another counter, > autovacuum_started_count. If there are other types of failure scenarios such as > corrupt indexes, checksum failures, etc. which terminate the > autovacuum in flight, > we would be able to catch this by looking at the number of autovacuums > started vs completed. The skipped counters in this patch and a started > counter would capture different stages of the autovacuum lifecycle; > skipped means > "never started" (lock contention), while a started-minus-completed delta means > "started but failed." Both are useful signals, but for different reasons. That makes sense. I was considering adding a counter to track "canceled" autovacuum or autoanalyze, but tracking the number of started or attempted autovacuums might provide a more general information than counting only the failed ones. > In terms of the patch: > > 1/ > > + if (AmAutoVacuumWorkerProcess()) > + pgstat_report_skipped_vacuum(relid); > > Any reason why this should not also include manual vacuum/analyze? > If someone has a vacuum/analyze script that uses SKIP_LOCKED, and > the operation gets skipped, this should be included in the counter. > this can be done with separate counter fields for autovacuum/autoanalyze and > vacuum/analyze For manual vacuum/analyze, an explicit WARNING is output when the operation is skipped, so I initially thought that reporting it in the stats view was not necessary. However, I now agree that it should be included. > 2/ > > + pg_stat_get_skipped_autovacuum_count(C.oid) AS > skipped_autovacuum_count, > > How about a name like "autovacuum_lock_skip_count"? I'm not sure this is a good fit, since it may be better to keep the naming consistent with other fields whose names end with "...vacuum_count" or "...analyze_count". Happy to hear other opinions. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
pgsql-hackers by date: