https://www.postgresql.org/docs/15/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
This link is to v15 docs. Is that the version you are actually using? The behavior of this feature has changed (repeatedly) over the years, so this is important to know.
This was definitely an improvement, but
when the big vacuum (freeze) kicked off there was a noticeable
io/latency spike.
Was the change noticeable because it caused degradation in the user experience (latency of things users actually care about) or because you have monitoring tools which detected a change even though no one was complaining? Can you tell if the cause is driven by sequential reads, random reads, or writes?
The way to suppress IO spikes is generally to tweak the vacuum_cost_* and/or auto_vacuum_vacuum_cost_* settings. This will make the vacuums take longer but be less intensive.
We then tried setting:
- autovacuum_freeze_table_age=100000000 # 100M
which seemed to make it happen more frequently (and one would hope,
with less work to do). We reduced this to 50M, and again it was more
frequent (but not the 2x I expected). And when I dropped it to 10M,
nothing changed.
Please be more quantitative. It wasn't 2x, but then what was it? And how long did each one take?
Freezing automatically more often is likely not the answer to IO spikes. Going from gumming up your system once a week for one hour to gumming it up 3 times a week for 20 minutes each is likely not a real solution. And it might not even do that--you could gum it up 3 times a week for 50 minutes each! Unless you have large swaths of table which become effectively read-only over time, each aggressive vacuum might need to do the same amount of IO so doing it more often just makes things worse.
The solution would be to either use the *vacuum_cost_* parameters to throttle it down to the point where it doesn't cause problems, or intentionally schedule vacuum freeze during quiet periods (over night, over weekends) preempting the need for them to happen automatically.
Cheers,
Jeff