Re: Vacuum statistics - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: Vacuum statistics
Date
Msg-id 164c3354-5350-4c72-a6b5-a790f50c15ce@yandex.ru
Whole thread Raw
In response to Re: Vacuum statistics  (Andrei Lepikhov <lepihov@gmail.com>)
List pgsql-hackers
Hi)
Thank you for your valuable feedback!

On 12.03.2026 18:28, Andrei Lepikhov wrote:

> On 12/3/26 13:02, Andrei Lepikhov wrote:
>> On 9/3/26 16:46, Alena Rybakina wrote:
>>> I discovered that my last patches were incorrectly formed. I updated 
>>> the correct version.
>>
>> I see that v29-0001-* is a quite separate feature itself at the 
>> moment. It makes sense to remove the commit message phrase for 
>> vm_new_frozen_pages and vm_new_visible_pages, introduced in later 
>> patches.
>> This patch itself looks good to me.
>
> Since this patch is almost ready for commit, I reviewed it carefully. 
> I noticed a documentation entry was missing, so I added it. Please see 
> the attachment.
I have added it in the documentation in the extension that you have 
noticed before, but I agree with your suggestion to move it in the core 
patch.
> While updating the patch file, I also made a few small adjustments, 
> including changing the parameter order in the struct and VIEW. The 
> commit message is also fixed.
Thank you) I agree with your fixes)
>
> In addition, it makes sense to discuss how these parameters are 
> supposed to be used. I see the following use cases:
>
> 1. Which tables have the most VM churn? - monitoring 
> rev_all_visible_pages normalised on the table size and its average 
> tuple width might expose the most suspicious tables (in terms of table 
> statistics).
> 2. DML Skew. Dividing rev_all_visible_pages by the number of tuple 
> updates/deletes, normalised by the average table and tuple sizes, 
> might indicate whether changes are localised within the table.
> 3. IndexOnlyScan effectiveness. Considering the speed of 
> rev_all_visible_pages change, normalised to the value of the 
> relallvisible statistic, we may detect tables where Index-Only Scan 
> might be inefficiently used.

Now it can be useful to track what table's pages are frozen by vacuum most.

By analyzing the ratio of frozen to unfrozen pages, you can see how well 
the balance is maintained. Ideally, this ratio should approach 1. If we 
have a higher ratio of unfrozen to frozen pages, it means the backand is 
frequently accessing the table, which could indicate that this table 
potentially requires attention to how well it's being handled by the 
vacuum. There may be unpredictability or even a seasonal trend — a page 
is frequently accessed only during certain periods (this is purely my 
observation). Also, if the ratio of frozen pages is higher, the vacuum 
may be configured too aggressively.

With the parameter that was included before (pg_class_relallfrozen and 
relallvisible 
https://github.com/MasaoFujii/postgresql/commit/99f8f3fbbc8f743290844e8c676d39dad11c5d5d) 
in the pg_stat_tables, I think I can provide isolation test to prove it 
- I can use my isolation test vacuum-extending-in-repetable-read.spec 
that I have added in the extension (ext_vacuum_statistics). What do you 
think?

>
> Feel free to criticise it or add your own - I’m just a developer, not 
> a DBA. Also, I’m not sure what use cases there are for the 
> rev_all_frozen_pages parameter.

Also, I would ask you if you don't mind to review the code in the 
extension that I have provided to store and control vacuum statistics. 
No one has ever looked at it unfortunately and any feedback is valuable 
now.

In addition, I'm currently working on the parameter that can track some 
parts of statistics. For example, we can track only buffer or wal 
statistics. If you are interested, I'll send you the code on my github. 
However I have already noticed that it requires to add dynamical memory 
allocation based on the guc value. I know that it requires a lot of 
attention in development but it will help to save memory during saving 
statistics. What do you think about this idea? To be honest, it was 
suggested before in this thread and I'm trying to realize it.




pgsql-hackers by date:

Previous
From: Bryan Green
Date:
Subject: Re: Avoid multiple calls to memcpy (src/backend/access/index/genam.c)
Next
From: Fujii Masao
Date:
Subject: Re: Add missing stats_reset column to pg_stat_database_conflicts view