We have some batch queries that had occasionally having degraded runtimes: from 2 hours degrading to 16 hours, etc.
Comparing plans from good and bad runs, we saw that the good plans used index-only scans on table “x”, while the bad plans used index scans.
Using the pg_visibility utility, we found that all of the 83 partitions of table “x” were showing zero blocks where all tuples were visible. We ran a VACUUM on the table; the visibility maps are now clean and the good plans came back.
Our question is: why did autovacuum not spare us from this?
We are using default autovacuum parameters for all except log_autovacuum_min_duration=5000. These partitions are populated by processes that do a truncate + a single insert-select.
We see autovacuum failure (failed to get lock) messages, followed by a success message, in the log for one of these partitions (the biggest one) but even that partition showed zero blocks with all tuples visible.
Are we wrong to expect autovacuum to clean up the visibility map?
postgres=# select version();
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 14.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22), 64-bit
Thank you,
Mike Tefft