Hi,
On Wed, Mar 19, 2025 at 09:53:37AM +0100, Christophe Pettus wrote:
> We're tracking down an issue that we've seen in two separate
> installations so far, which is that, at the very end of a vacuum, the
> vacuum operation starts using *very* high levels of CPU and
> (sometimes) I/O, often to the point that the system becomes unable to
> service other requests. We've seen this on versions 15, 16, and 17 so
> far.
Ouch.
> The common data points are:
>
> 1. The table being vacuumed is large (>250 million rows, often in the
> >10 billion row level).
> 2. The table has a relatively high churn rate.
> 3. The number of updated / deleted rows before that particular vacuum
> cycle are very high.
>
> Everything seems to point to the vacuum free space map operation,
> since it would have a lot of work to do in that particular situation,
> it happens at just the right place in the vacuum cycle, and its
> resource consumption is not throttled the way the regular vacuum
> operation is.
Independent of throttling, if it turns out free space map vacuum is
indeed the culprit, I think it would make sense to add that one as a
dedicated phase so it can be more easily tracked in
pg_stat_progress_vacuum etc.
Michael