On 4/3/22 12:05 PM, Andres Freund wrote:
> While I was writing the above I, again, realized that it'd be awfully nice to
> have some accumulated stats about (auto-)vacuum's effectiveness. For us to get
> feedback about improvements more easily and for users to know what aspects
> they need to tune.
>
> Knowing how many times a table was vacuumed doesn't really tell that much, and
> requiring to enable log_autovacuum_min_duration and then aggregating those
> results is pretty painful (and version dependent).
>
> If we just collected something like:
> - number of heap passes
> - time spent heap vacuuming
> - number of index scans
> - time spent index vacuuming
> - time spent delaying
The number of passes would let you know if maintenance_work_mem is too
small (or to stop killing 187M+ tuples in one go). The timing info would
give you an idea of the impact of throttling.
> - percentage of non-yet-removable vs removable tuples
This'd give you an idea how bad your long-running-transaction problem is.
Another metric I think would be useful is the average utilization of
your autovac workers. No spare workers means you almost certainly have
tables that need vacuuming but have to wait. As a single number, it'd
also be much easier for users to understand. I'm no stats expert, but
one way to handle that cheaply would be to maintain an
engineering-weighted-mean of the percentage of autovac workers that are
in use at the end of each autovac launcher cycle (though that would
probably not work great for people that have extreme values for launcher
delay, or constantly muck with launcher_delay).
>
> it'd start to be a heck of a lot easier to judge how well autovacuum is
> coping.
>
> If we tracked the related pieces above in the index stats (or perhaps
> additionally there), it'd also make it easier to judge the cost of different
> indexes.
>
> - Andres
>
>