> On Tue, Oct 07, 2025 at 12:45:12PM -0500, Sami Imseih wrote:
> > The vacuum command detail can now be determined from
> > pg_stat_activity.query by joining with pg_stat_progress_vacuum, right?
> > I don't see why this is not sufficient, especially because it already
> > indicates how the vacuum was triggered, and the autovacuum activity
> > message also tells you why it was triggered. We could perhaps add "due to
> > failsafe" to the autovacuum activity message to explicitly show that reason.
>
> Eh, IMHO requiring users to look for a certain substring in the query field
> doesn't seem especially user-friendly to me. (I was going to point out
> that it's undocumented, too, but it is in fact documented [0].)
I am not sure if it's a bad user experience. In my experience that string
is quite easy to parse.
Also, It is also common for a DBA to have to reference
pg_stat_activity anyhow to determine how long the vacuum been
running for, wait events, etc. IMO, the progress view is the wrong place
for all information that is static ( does not change from the start of the
command ) and can be derived from the query string.
>> Right. I think we cannot display both things in one mode column. Since
>> both manual vacuums and anti-wraparound autovacuums can enter the
>> failsafe mode dynamically, if we show "failsafe" in the mode column,
>> we would lose the information "why is this vacuum running". I guess we
>> would need separate columns. For example, I guess that the column
>> showing "how is it operating under the hood" can have three values:
>> "normal", "aggressive" (disables VM optimization), and "failsafe"
>> (implies aggressive vacuum and disables many things to prioritize XID
>> freezing).
> Am I understanding correctly that your idea is to have a "reason" column
> that would have values like "manual", "normal autovacuum", and "autovacuum
> for wraparound", and a "mode" column that would have values like "normal",
> "agressive", and "failsafe"? I wonder if we could be even more granular
> for the "normal autovacuum" case and point to the reason the table was
> chosen. For example, was it the insert threshold, the update/delete
> threshold, etc.?
ahh, it's true that failsafe can trigger while an (auto)vacuum is in progress,
the check does not happen at the start, but in places like the main loop
of lazy_scan_heap. Since "failsafe" can be flipped on in-flight, I can see
that being a useful (bool?) field in the progress view.
--
Sami