Re: Vacuum statistics - Mailing list pgsql-hackers

From Sami Imseih
Subject Re: Vacuum statistics
Date
Msg-id CAA5RZ0v4YZA5uSDh+MU60cEpPvinuAaYPTvO+aZ7H7Vj3whi0A@mail.gmail.com
Whole thread Raw
In response to Re: Vacuum statistics  (Alena Rybakina <a.rybakina@postgrespro.ru>)
List pgsql-hackers
> I am yet to look very closely, but I think some additional columns that
> will be useful is the number of failsafe autovacuums occurred.
>
> Do you mean when the autovacuum started to prevent workaround?
>

Specifically vacuum_failsafe_age [1] when autovacuum automatically
performs a vacuum without index cleanup, without truncate, bypassing
the vacuum ring buffer and disabling the cost limits. The purpose of this
is a last ditch effort to avoid wraparound and is triggered at 1.6 billion
transactions by default. When this state occurs, there is a single log
written for every table that is vacuumed with these options [2], and
my thoughts is to also track in the view as the use of these options
will overtime make the indexes bloat over time and less space is
given back to the OS due to skipped truncations. For most workloads,
this should not be common, but I am thinking of the extreme cases
or if someone potentially misconfigured the vacuum_failsafe_age.

As I thought about this more, failsafe autovacuum could be tracked on
the database level, pg_stat_database, since this guc can't be set
on a relation level.

> Also
> the counter for number of index_cleanup skipped, truncate phase
> skipped and toast vacuuming skipped ( the latter will only be relevant
> for the main relation ).
>
> I can add, but concerns have already been expressed about the large amount of
> vacuum statistics and, as a consequence, this leads
> to the allocation of additional memory (3 times).
> Of course, now we are saved by the guc I added for statistics...
> I understand that this information can better show the efficiency of the vacuum,
> but how does it help in setting it up for heap relations?

An administrator will find this information to be useful especially
if for some reason most vacuums are being run with these
options being off either via a manual vacuum or someone
turning off index_cleanup in the tables storage parameter.

postgres=# alter table t set (vacuum_index_cleanup = off,
vacuum_truncate = off );
ALTER TABLE

> regarding the skipped truncate phase, the statistics are already collected in vacrel->nonempty_pages,
> it's easy to put them outside. I think the current statistics only show the number of deleted tuples and pages
> (both deleted and those visited by vacuum during tuple deletion), so the opposite view won't hurt.

Can you clarify what you mean by "so the opposite view won't hurt." ?

> index_cleanup skipped can be obtained based on information from a small number of
> vacuum buffer statistics and the number of pages of indexes that belong to heap relations.
> I think you can notice the behavior through current statistics:

I don't think there is a view that provides cumulative vacuum buffer
stats currently.
pg_stat_io could be helpful for this purpose, but that is a cluster
wide view. As it
stands now, I think it's quite difficult for a user to determine for a
fact if indexes or
truncate is being skipped


> Secondly, where to put the total time of vacuum for indexes and databases?
> It would be incorrect not to take them into account at all. What if we remove the total time from
> the heap statistics and add it to pg_stat_tables and only leave the vacuum statistics total time of
> vacuum operation of indexes and databases?
> It seems strange to me that they will have to be viewed from different views.
>
> I think it is necessary to look at the total time for tables into perspective of how much
> time vacuum spent in total on processing indexes, since indexes can be bloated, for example.
> I think it is better to leave these statistics here.

You make valid points. I now think because track_vacuum_statistics is
optional, we should track total_time in 2 places. First place in the new
view being proposed here and the second place is in pg_stat_all_tables
as being proposed here [3]. This way if track_vacuum_statistics is off, the
total_time of vacuum could still be tracked by pg_stat_all_tables.

By the way, the current patch does not track materialized view,
but it should as materialized views can also be vacuumed.

Regards,

Sami

[1] https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-VACUUM-FAILSAFE-AGE
[2] https://github.com/postgres/postgres/blob/master/src/backend/access/heap/vacuumlazy.c#L2437-L2444
[3] https://commitfest.postgresql.org/52/5485/



pgsql-hackers by date:

Previous
From: Paul Jungwirth
Date:
Subject: Re: Index AM API cleanup
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Log a warning in pg_createsubscriber for max_slot_wal_keep_size