On Tue, Oct 22, 2024 at 03:12:53PM -0400, Melanie Plageman wrote:
> By considering only the unfrozen portion of the table when calculating
> the vacuum insert threshold, we can trigger vacuums more proactively
> on insert-heavy tables. This changes the definition of
> insert_scale_factor to a percentage of "active" table size. The
> attached patch does this.
I think this is a creative idea. My first reaction is to question whether
it makes send to have two strategies for this sort of thing:
autovacuum_vacuum_max_threshold for updates/deletes and this for inserts.
Perhaps we don't want to more aggressively clean up bloat (except for the
very largest tables via the hard cap), but we do want to more aggressively
mark newly-inserted tuples frozen. I'm curious what you think.
> I've estimated the unfrozen percentage of the table by adding a new
> field to pg_class, relallfrozen, which is updated in the same places
> as relallvisible.
Wouldn't relallvisible be sufficient here? We'll skip all-visible pages
unless this is an anti-wraparound vacuum, at which point I would think the
insert threshold goes out the window.
> More frequent vacuums means each vacuum scans fewer pages, but, more
> interestingly, the first vacuum after a checkpoint is much more
> efficient. With the patch, the first vacuum after a checkpoint emits
> half as many FPIs. You can see that only 18 pages were newly dirtied.
> So, with the patch, the pages being vacuumed are usually still in
> shared buffers and still dirty.
Are you aware of any scenarios where your proposed strategy might make
things worse? From your test results, it sounds like these vacuums ought
to usually be relatively efficient, so sending insert-only tables to the
front of the line is normally okay, but maybe that's not always true.
--
nathan