Home > mailing lists

Re: Vacuum statistics - Mailing list pgsql-hackers

From	Alena Rybakina
Subject	Re: Vacuum statistics
Date	January 5 02:37:51
Msg-id	a55da89f-bbc9-444d-8a0b-1a354257f571@postgrespro.ru Whole thread Raw
In response to	Re: Vacuum statistics (Sami Imseih <samimseih@gmail.com>)
Responses	Re: Vacuum statistics
List	pgsql-hackers

Tree view

Hi, thank you for your attention to this patch.

On 02.01.2025 23:12, Sami Imseih wrote:

Hi,

Thanks for the work you have done here. Exposing cumulative
metrics at this level of detail for vacuum is surely useful to find
vacuum bottlenecks and to determine the effectiveness of
vacuum tuning.

Yes, we hope that this will help provide more detailed information about the current efficiency of the vacuum and also suggest how to best configure it for the relationship.

I am yet to look very closely, but I think some additional columns that
will be useful is the number of failsafe autovacuums occurred.

Do you mean when the autovacuum started to prevent workaround?

Also
the counter for number of index_cleanup skipped, truncate phase
skipped and toast vacuuming skipped ( the latter will only be relevant
for the main relation ).

I can add, but concerns have already been expressed about the large amount of vacuum statistics and, as a consequence, this leads to the allocation of additional memory (3 times).

Of course, now we are saved by the guc I added for statistics... I understand that this information can better show the efficiency of the vacuum, but how does it help in setting it up for heap relations?

regarding the skipped truncate phase, the statistics are already collected in vacrel->nonempty_pages, it's easy to put them outside. I think the current statistics only show the number of deleted tuples and pages (both deleted and those visited by vacuum during tuple deletion), so the opposite view won't hurt.

index_cleanup skipped can be obtained based on information from a small number of vacuum buffer statistics and the number of pages of indexes that belong to heap relations. I think you can notice the behavior through current statistics: if the index's buffer values have increased very slightly, then the vacuum does not go there probably because of the impossibility of taking a clean-up lock on the index. The same information can be obtained based on the number of missed_tuples in heap relations. I wrote earlier how these values are related.

toast vacuuming skipped to be honest I haven't found a place where vacuum skips it in the code yet, so I can't say anything about them yet.

I also wonder if if makes sense to break down timing by phase. I surely
would like to know how much of my vacuum time was spent in index
cleanup vs heap scan, etc.

At the moment, this information has already been added to the statistics as a total time for heap relations and their indexes.


A nit: I noticed in v14, the column is "schema". It should be "schemaname"
for consistency.

Thank you, I'll fix it in the next version of the patch.

Also, instead of pg_stat_vacuum_tables, what about pg_stat_vacuum?

Now, I became aware of this discussion after starting a new thread
to track total time spent in vacuum/analyze in pg_stat_all_tables [1].
But this begs the question of what should be done with the current
counters in pg_stat_all_tables? I see it mentioned above that (auto)vacuum_count
should be added to this new view, but it's also already in pg_stat_all_tables.
I don't think we should be duplicating the same columns across views.

Alternatively, we can remove the vacuum related stats from pg_stat_all_tables,
but that will break monitoring tools and will leave us with the (auto)analyze
metrics alone in pg_stat_all_tables. This sounds very ugly.

What do you think?

Regards,

Sami Imseih
Amazon Web Services (AWS)

[1] https://commitfest.postgresql.org/52/5485/

I don't think they interfere with my more detailed views of how the vacuum works. I don't think there's anything worth removing.


I think total_time should be removed from your current patch and added
as is being suggested in [1]. This way high level metrics such as counts
and total time spent remain in pg_stat_all_tables, while the new view
you are proposing will contain more details. I don't think we will have
consistency issues between the views because a reset using pg_stat_reset()
will act on all the stats and pg_stat_reset_single_table_counters() will act on
all the stats related to that table. There should be no way to reset the vacuum
stats independently, AFAICT.

I think it is not quite correct to do so.

Firstly, the total time of vacuum operation does not give you a complete idea of when vacuum did not work delay time. I have seen many reports where vacuum spends very little time on cleaning relations and most of the time just sleeping.

Secondly, where to put the total time of vacuum for indexes and databases? It would be incorrect not to take them into account at all. What if we remove the total time from the heap statistics and add it to pg_stat_tables and only leave the vacuum statistics total time of vacuum operation of indexes and databases? It seems strange to me that they will have to be viewed from different views.

I think it is necessary to look at the total time for tables into perspective of how much time vacuum spent in total on processing indexes, since indexes can be bloated, for example. I think it is better to leave these statistics here.

-- 
Regards,
Alena Rybakina
Postgres Professional

pgsql-hackers by date:

From: Michel Pelletier
Date: 05 January, 02:34:40
Subject: Re: Using Expanded Objects other than Arrays from plpgsql

From: Tatsuo Ishii
Date: 05 January, 07:51:29
Subject: Re: Proposal: add new API to stringinfo

Re: Vacuum statistics - Mailing list pgsql-hackers

Previous

Next