Hi hackers,
We've encountered an issue where autovacuum holds an AccessExclusive lock for an extended period when attempting to truncate heap tables. The root cause appears to be the way autovacuum handles block truncation, specifically when scanning shared buffers during smgrtruncate
. This issue is particularly painful for large shared buffer configurations (e.g., 1.8 TiB with NUMA and multiple workers), and pg_statistic is among the most affected tables.
When autovacuum detects freeable blocks at the tail of a relation, it attempts to truncate them. It acquires an AccessExclusive lock and scans all shared buffers to remove the corresponding blocks. With a large shared buffer configuration (e.g., 1.8 TiB, 200 million blocks, and multiple NUMA nodes), this process is slow and significantly impacts performance.
We propose modifying the truncation condition in should_attempt_truncation
to avoid unnecessary full buffer scans. The new formula ensures we only attempt truncation when we can free at least 3% of the relation size + 2 pages. This change prevents excessive truncation attempts on small tables while reducing the impact on larger relations. Previously, small tables could theoretically be truncated to a single page, but with this change, they may remain around 3 pages instead. We don't see this as a significant issue.
The idea for this patch was proposed by Yuriy Sokolov.
There is also an open discussion about optimizing the shared buffer traversal to minimize the impact of AccessExclusive locks. This patch is a step toward improving the situation.
The patch is attached to this message.
Best regards, Stepan Neretin.