Thread: Vacuum statistics

Vacuum statistics

From

Alena Rybakina

Date:

30 May 2024, 17:33:51

Hello, everyone!

I think we don't have enough information to analyze vacuum functionality.

Needless to say that the vacuum is the most important process for a 
database system. It prevents problems like table and index bloating and 
emergency freezing if we have a wraparound problem. Furthermore, it 
keeps the visibility map up to date. On the other hand, because of 
incorrectly adjusted aggressive settings of autovacuum it can consume a 
lot of computing resources that lead to all queries to the system 
running longer.

Nowadays the vacuum gathers statistical information about tables, but it 
is important not for optimizer only.

Because the vacuum is an automation process, there are a lot of settings 
that determine their aggressive functionality to other objects of the 
database. Besides, sometimes it is important to set a correct parameter 
for the specified table, because of its dynamic changes.

An administrator of a database needs to set the settings of autovacuum 
to have a balance between the vacuum's useful action in the database 
system on the one hand, and the overhead of its workload on the other. 
However, it is not enough for him to decide on vacuum functionality 
through statistical information about the number of vacuum passes 
through tables and operational data from progress_vacuum, because it is 
available only during vacuum operation and does not provide a strategic 
overview over the considered period.

To sum up, an automation vacuum has a strategic behavior because the 
frequency of its functionality and resource consumption depends on the 
workload of the database. Its workload on the database is minimal for an 
append-only table and it is a maximum for the table with a 
high-frequency updating. Furthermore, there is a high dependence of the 
vacuum load on the number and volume of indexes. Because of the absence 
of the visibility map for indexes, the vacuum scans the index 
completely, and the worst situation when it needs to do it during a 
bloating index situation in a small table.


I suggest gathering information about vacuum resource consumption for 
processing indexes and tables and storing it in the table and index 
relationships (for example, PgStat_StatTabEntry structure like it has 
realized for usual statistics). It will allow us to determine how well 
the vacuum is configured and evaluate the effect of overhead on the 
system at the strategic level, the vacuum has gathered this information 
already, but this valuable information doesn't store it.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Vacuum statistics

From

Alena Rybakina

Date:

30 May 2024, 18:26:38

On 30.05.2024 10:33, Alena Rybakina wrote:
>
> I suggest gathering information about vacuum resource consumption for 
> processing indexes and tables and storing it in the table and index 
> relationships (for example, PgStat_StatTabEntry structure like it has 
> realized for usual statistics). It will allow us to determine how well 
> the vacuum is configured and evaluate the effect of overhead on the 
> system at the strategic level, the vacuum has gathered this 
> information already, but this valuable information doesn't store it.
>
My colleagues and I have prepared a patch that can help to solve this 
problem.

We are open to feedback.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

0001-Machinery-for-grabbing-an-extended-vacuum-statistics.patch

Re: Vacuum statistics

From

Andrei Zubkov

Date:

30 May 2024, 19:19:26

Hi,

Th, 30/05/2024 at 10:33 -0700, Alena Rybakina wrote:
> I suggest gathering information about vacuum resource consumption for
> processing indexes and tables and storing it in the table and index
> relationships (for example, PgStat_StatTabEntry structure like it has
> realized for usual statistics). It will allow us to determine how
> well
> the vacuum is configured and evaluate the effect of overhead on the
> system at the strategic level, the vacuum has gathered this
> information
> already, but this valuable information doesn't store it.
>
It seems a little bit unclear to me, so let me explain a little the
point of a proposition.

As the vacuum process is a backend it has a workload instrumentation.
We have all the basic counters available such as a number of blocks
read, hit and written, time spent on I/O, WAL stats and so on.. Also,
we can easily get some statistics specific to vacuum activity i.e.
number of tuples removed, number of blocks removed, number of VM marks
set and, of course the most important metric - time spent on vacuum
operation.

All those statistics must be stored by the Cumulative Statistics System
on per-relation basis. I mean individual cumulative counters for every
table and every index in the database.

Such counters will provide us a clear view about vacuum workload on
individual objects of the database, providing means to measure the
efficiency of performed vacuum fine tuning.
--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Vacuum statistics

From

Dilip Kumar

Date:

07 June 2024, 06:46:30

On Thu, May 30, 2024 at 11:57 PM Alena Rybakina
<lena.ribackina@yandex.ru> wrote:
>
> On 30.05.2024 10:33, Alena Rybakina wrote:
> >
> > I suggest gathering information about vacuum resource consumption for
> > processing indexes and tables and storing it in the table and index
> > relationships (for example, PgStat_StatTabEntry structure like it has
> > realized for usual statistics). It will allow us to determine how well
> > the vacuum is configured and evaluate the effect of overhead on the
> > system at the strategic level, the vacuum has gathered this
> > information already, but this valuable information doesn't store it.
> >
> My colleagues and I have prepared a patch that can help to solve this
> problem.
>
> We are open to feedback.

I was reading through the patch here are some initial comments.

--
+typedef struct LVExtStatCounters
+{
+ TimestampTz time;
+ PGRUsage ru;
+ WalUsage walusage;
+ BufferUsage bufusage;
+ int64 VacuumPageMiss;
+ int64 VacuumPageHit;
+ int64 VacuumPageDirty;
+ double VacuumDelayTime;
+ PgStat_Counter blocks_fetched;
+ PgStat_Counter blocks_hit;
+} LVExtStatCounters;

I noticed that you are storing both pgBufferUsage and
VacuumPage(Hit/Miss/Dirty) stats. Aren't these essentially the same?
It seems they both exist in the system because some code, like
heap_vacuum_rel(), uses pgBufferUsage, while do_analyze_rel() still
relies on the old counters. And there is already a patch to remove
those old counters.

--
+static Datum
+pg_stats_vacuum(FunctionCallInfo fcinfo, ExtVacReportType type, int ncolumns)
+{

I don't think you need this last parameter (ncolumns) we can anyway
fetch that from tupledesc, so adding an additional parameter
just for checking doesn't look good to me.

--
+ /* Tricky turn here: enforce pgstat to think that our database us dbid */
+
+ MyDatabaseId = dbid;

typo
/think that our database us dbid/think that our database has dbid

Also, remove the blank line between the comment and the next code
block that is related to that comment.

--
  VacuumPageDirty = 0;
+ VacuumDelayTime = 0.;

There is an extra "." after 0

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Vacuum statistics

From

Alena Rybakina

Date:

08 June 2024, 06:30:47

Hi! Thank you for your interest in this topic!

On 07.06.2024 09:46, Dilip Kumar wrote:

On Thu, May 30, 2024 at 11:57 PM Alena Rybakina
<lena.ribackina@yandex.ru> wrote:

On 30.05.2024 10:33, Alena Rybakina wrote:

I suggest gathering information about vacuum resource consumption for
processing indexes and tables and storing it in the table and index
relationships (for example, PgStat_StatTabEntry structure like it has
realized for usual statistics). It will allow us to determine how well
the vacuum is configured and evaluate the effect of overhead on the
system at the strategic level, the vacuum has gathered this
information already, but this valuable information doesn't store it.

My colleagues and I have prepared a patch that can help to solve this
problem.

We are open to feedback.

I was reading through the patch here are some initial comments.

--
+typedef struct LVExtStatCounters
+{
+ TimestampTz time;
+ PGRUsage ru;
+ WalUsage walusage;
+ BufferUsage bufusage;
+ int64 VacuumPageMiss;
+ int64 VacuumPageHit;
+ int64 VacuumPageDirty;
+ double VacuumDelayTime;
+ PgStat_Counter blocks_fetched;
+ PgStat_Counter blocks_hit;
+} LVExtStatCounters;


I noticed that you are storing both pgBufferUsage and
VacuumPage(Hit/Miss/Dirty) stats. Aren't these essentially the same?
It seems they both exist in the system because some code, like
heap_vacuum_rel(), uses pgBufferUsage, while do_analyze_rel() still
relies on the old counters. And there is already a patch to remove
those old counters.

I agree with you and I have fixed it.


--
+static Datum
+pg_stats_vacuum(FunctionCallInfo fcinfo, ExtVacReportType type, int ncolumns)
+{

I don't think you need this last parameter (ncolumns) we can anyway
fetch that from tupledesc, so adding an additional parameter
just for checking doesn't look good to me.

To be honest, I'm not sure if ncolumns should be deleted at all, because the pg_stats_vacuum function is used to display three different types of statistics: for tables, indexes, and databases. We use this parameter to pass information about the number of parameters (or how many statistics we expect) depending on the type of statistics. For example, table vacuum statistics contain 27 parameters, while indexes and databases contain 19 and 15 parameters, respectively. You can see that the pg_stats_vacuum function contains an Assert that checks that the expected number of tupledesc parameters matches the actual number.

Assert(tupdesc->natts == ncolumns);

Perhaps I can convert it to a local parameter and determine its value already in the function, for example:

pg_stats_vacuum(FunctionCallInfo fcinfo, ExtVacReportType type, int ncolumns)
{

int columns = 0;

switch (type)

{

case PGSTAT_EXTVAC_HEAP:

ncolumns = EXTVACHEAPSTAT_COLUMNS;

break;

case PGSTAT_EXTVAC_INDEX:

ncolumns = EXTVACINDEXSTAT_COLUMNS;

break;

case PGSTAT_EXTVAC_DB:

ncolumns = EXTVACDBSTAT_COLUMNS;

break; }

...

}

What do you think?

--
+ /* Tricky turn here: enforce pgstat to think that our database us dbid */
+
+ MyDatabaseId = dbid;

typo
/think that our database us dbid/think that our database has dbid

Also, remove the blank line between the comment and the next code
block that is related to that comment.


--  VacuumPageDirty = 0;
+ VacuumDelayTime = 0.;

There is an extra "." after 0

Thank you, I fixed it.

In addition to these changes, I fixed the problem with displaying vacuum statistics for databases: I found an error in defining the pg_stats_vacuum_database system view. In addition, I rewrote the tests and converted them into a regression test. In addition, I have divided the test to test the functionality of the output of vacuum statistics into two tests: one of them checks the functionality of tables and databases, and the other - indexes. This is caused by a problem with the vacuum functionality when the table contains an index. You can find more information about this here: [0] and [1].

I attached the diff to this letter.

[0] https://www.postgresql.org/message-id/d1ca3a1d-7ead-41a7-bfd0-5b66ad97b1cd%40yandex.ru

[1] https://www.postgresql.org/message-id/CAH2-Wznv94Q_Td8OS8bAN7fYLpfU6CGgjn6Xau5eJ_sDxEGeBA%40mail.gmail.com

I am currently working on dividing this patch into three parts to simplify the review process: one of them will contain code for collecting vacuum statistics on tables, the second on indexes and the last on databases. I also write the documentation.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Vacuum statistics

From

Alena Rybakina

Date:

12 June 2024, 06:37:35

Hi!

On 11.06.2024 16:09, Alena Rybakina wrote:

On 08.06.2024 09:30, Alena Rybakina wrote:
I am currently working on dividing this patch into three parts to simplify the review process: one of them will contain code for collecting vacuum statistics on tables, the second on indexes and the last on databases.

I have divided the patch into three: the first patch contains code for the functionality of collecting and storage for tables, the second one for indexes and the last one for databases.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Vacuum statistics

From

Alena Rybakina

Date:

12 June 2024, 06:38:30

Hi!

On 11.06.2024 16:09, Alena Rybakina wrote:

On 08.06.2024 09:30, Alena Rybakina wrote:
I am currently working on dividing this patch into three parts to simplify the review process: one of them will contain code for collecting vacuum statistics on tables, the second on indexes and the last on databases.

I have divided the patch into three: the first patch contains code for the functionality of collecting and storage for tables, the second one for indexes and the last one for databases.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Vacuum statistics

From

Alena Rybakina

Date:

17 June 2024, 10:09:51

I have written the documentary and attached the patch.

On 08.06.2024 09:30, Alena Rybakina wrote:

I am currently working on dividing this patch into three parts to simplify the review process: one of them will contain code for collecting vacuum statistics on tables, the second on indexes and the last on databases. I also write the documentation.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

v3-0004-Add-documentation-about-the-system-views-that.patch

Re: Vacuum statistics

From

Masahiko Sawada

Date:

27 June 2024, 01:39:45

On Fri, May 31, 2024 at 4:19 AM Andrei Zubkov <zubkov@moonset.ru> wrote:
>
> Hi,
>
> Th, 30/05/2024 at 10:33 -0700, Alena Rybakina wrote:
> > I suggest gathering information about vacuum resource consumption for
> > processing indexes and tables and storing it in the table and index
> > relationships (for example, PgStat_StatTabEntry structure like it has
> > realized for usual statistics). It will allow us to determine how
> > well
> > the vacuum is configured and evaluate the effect of overhead on the
> > system at the strategic level, the vacuum has gathered this
> > information
> > already, but this valuable information doesn't store it.
> >
> It seems a little bit unclear to me, so let me explain a little the
> point of a proposition.
>
> As the vacuum process is a backend it has a workload instrumentation.
> We have all the basic counters available such as a number of blocks
> read, hit and written, time spent on I/O, WAL stats and so on.. Also,
> we can easily get some statistics specific to vacuum activity i.e.
> number of tuples removed, number of blocks removed, number of VM marks
> set and, of course the most important metric - time spent on vacuum
> operation.

I've not reviewed the patch closely but it sounds helpful for users. I
would like to add a statistic, the high-water mark of memory usage of
dead tuple TIDs. Since the amount of memory used by TidStore is hard
to predict, I think showing the high-water mark would help users to
predict how much memory they set to maintenance_work_mem.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Vacuum statistics

From

Andrei Zubkov

Date:

27 June 2024, 08:52:01

Hello!

On Thu, 27/06/2024 at 10:39 +0900, Masahiko Sawada:
> On Fri, May 31, 2024 at 4:19 AM Andrei Zubkov <zubkov@moonset.ru>
> wrote:
> > As the vacuum process is a backend it has a workload
> > instrumentation.
> > We have all the basic counters available such as a number of blocks
> > read, hit and written, time spent on I/O, WAL stats and so on..
> > Also,
> > we can easily get some statistics specific to vacuum activity i.e.
> > number of tuples removed, number of blocks removed, number of VM
> > marks
> > set and, of course the most important metric - time spent on vacuum
> > operation.
>
> I've not reviewed the patch closely but it sounds helpful for users.
> I
> would like to add a statistic, the high-water mark of memory usage of
> dead tuple TIDs. Since the amount of memory used by TidStore is hard
> to predict, I think showing the high-water mark would help users to
> predict how much memory they set to maintenance_work_mem.
>
Thank you for your interest on this patch. I've understand your idea.
The obvious goal of it is to avoid expensive index multi processing
during vacuum of the heap. Provided statistics in the patch contain the
index_vacuum_count counter for each table which can be compared to the
pg_stat_all_tables.vacuum_count to detect specific relation index
multi-passes. Previous setting of maintenance_work_mem is known. Usage
of TidStore should be proportional to the amount of dead-tuples vacuum
workload on the table, so as the first evaluation we can take the
number of index passes per one heap pass as a maintenance_work_mem
multiplier.

But there is a better way. Once we detected the index multiprocessing
we can lower the vacuum workload for the heap pass making vacuum a
little bit more aggressive for this particular relation. I mean, in
such case increasing maintenance_work_mem is not only decision.

Suggested high-water mark statistic can't be used as cumulative
statistic - any high-water mark statistic as maximim-like statistic is
valid for certain time period thus should be reset on some kind of
schedule. Without resets it should reach 100% once under the heavy load
and stay there forever.

Said that such high-water mark seems a little bit unclear and
complicated for the DBA. It seems redundant to me right now. I can see
the main value of such statistic is to avoid too large
maintenance_work_mem setting. But I can't see really dramatic
consequences of that. Maybe I've miss something..

--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Vacuum statistics

From

Ilia Evdokimov

Date:

10 August 2024, 18:14:27

Hello, everyone!


Thank you for your interesting patch with extended information 
statistics about autovacuum.

Do you consider not to create new table in pg_catalog but to save 
statistics in existing table? I mean pg_class or 
pg_stat_progress_analyze, pg_stat_progress_vacuum?

P.S. If I sent this mail twice, I'm sorry :)


Regards

Ilia Evdokimov,

Tantor Labs.

Re: Vacuum statistics

From

Andrei Zubkov

Date:

10 August 2024, 19:37:25

Hi, Ilia!

> Do you consider not to create new table in pg_catalog but to save
> statistics in existing table? I mean pg_class or
> pg_stat_progress_analyze, pg_stat_progress_vacuum?
>
Thank you for your interest on our patch!

*_progress views is not our case. They hold online statistics while
vacuum is in progress. Once work is done on a table the entry is gone
from those views. Idea of this patch is the opposite - it doesn't
provide online statistics but it accumulates statistics about rosources
consumed by all vacuum passes over all relations. It's much closer to
the pg_stat_all_tables than pg_stat_progress_vacuum.

It seems pg_class is not the right place because it is not a statistic
view - it holds the current relation state and haven't anything about
the relation workload.

Maybe the pg_stat_all_tables is the right place but I have several
thoughts about why it is not:
- Some statistics provided by this patch is really vacuum specific. I
don't think we want them in the relation statistics view.
- Postgres is extreamly extensible. I'm sure someday there will be
table AMs that does not need the vacuum at all.

Right now vacuum specific workload views seems optimal choice to me.

Regards,

--
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Vacuum statistics

From

Kirill Reshke

Date:

10 August 2024, 20:57:35

On Wed, 12 Jun 2024 at 11:38, Alena Rybakina <lena.ribackina@yandex.ru> wrote:
>
> Hi!
>
> On 11.06.2024 16:09, Alena Rybakina wrote:
>
> On 08.06.2024 09:30, Alena Rybakina wrote:
>
> I am currently working on dividing this patch into three parts to simplify the review process: one of them will
containcode for collecting vacuum statistics on tables, the second on indexes and the last on databases.

>
> I have divided the patch into three: the first patch contains code for the functionality of collecting and storage
fortables, the second one for indexes and the last one for databases.

>
> --
> Regards,
> Alena Rybakina
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
Hi!
Few suggestions on this patch-set

1)
> +{ oid => '4701',
> +  descr => 'pg_stats_vacuum_tables return stats values',
> +  proname => 'pg_stats_vacuum_tables', provolatile => 's', prorettype => 'record',proisstrict => 'f',
> +  proretset => 't',
> +  proargtypes => 'oid oid',
> +  proallargtypes =>

During development, OIDs should be picked up from range 8000-9999.
Same for pg_stats_vacuum_database & pg_stats_vacuum_indexes

Also, why are these function naming schemes like
pg_stats_vacuum_*something*, not pg_stat_vacuum_*something*, like
pg_stat_replication etc?

2) In 0003:
> +  proargnames =>
'{dboid,dboid,db_blks_read,db_blks_hit,total_blks_dirtied,total_blks_written,wal_records,wal_fpi,wal_bytes,blk_read_time,blk_write_time,delay_time,system_time,user_time,total_time,interrupts}',

Repeated dboid arg name is strange. Is it done this way to make
pg_stats_vacuum function call in more unified fashion? I don't see any
other place within postgresql core with similar approach, so I doubt
it is correct.

3) 0001 patch vacuum_tables_statistics test creates
statistic_vacuum_database1, but does not use it. 0003 do.
Also I'm not sure if these additional checks on the second database
adds much value. Can you justify this please?

Other places look more or less fine to me.
However, I'll maybe post some additional nit-picky comments on the
next patch version.

-- 
Best regards,
Kirill Reshke

Re: Vacuum statistics

From

Alena Rybakina

Date:

11 August 2024, 13:58:54

Hi! Thank you for inquiring about our topic!

On 10.08.2024 23:57, Kirill Reshke wrote:

Hi!

Few suggestions on this patch-set

1)

+{ oid => '4701',
+  descr => 'pg_stats_vacuum_tables return stats values',
+  proname => 'pg_stats_vacuum_tables', provolatile => 's', prorettype => 'record',proisstrict => 'f',
+  proretset => 't',
+  proargtypes => 'oid oid',
+  proallargtypes =>

During development, OIDs should be picked up from range 8000-9999.
Same for pg_stats_vacuum_database & pg_stats_vacuum_indexes

Also, why are these function naming schemes like
pg_stats_vacuum_*something*, not pg_stat_vacuum_*something*, like
pg_stat_replication etc?

To be honest, when I named it, I missed this aspect. I thought about the plural vacuum statistics we show, so I named them. I fixed it.

2) In 0003:

+  proargnames => '{dboid,dboid,db_blks_read,db_blks_hit,total_blks_dirtied,total_blks_written,wal_records,wal_fpi,wal_bytes,blk_read_time,blk_write_time,delay_time,system_time,user_time,total_time,interrupts}',

Repeated dboid arg name is strange. Is it done this way to make
pg_stats_vacuum function call in more unified fashion? I don't see any
other place within postgresql core with similar approach, so I doubt
it is correct.

Both parameters are required for input and output. We are trying to find statistics for a specific database if the database oid was specified by the user or display statistics for all objects, but we need to display which database these statistics are for. I corrected the name of the first parameter.

3) 0001 patch vacuum_tables_statistics test creates
statistic_vacuum_database1, but does not use it. 0003 do.
Also I'm not sure if these additional checks on the second database
adds much value. Can you justify this please?

The statistic_vacuum_database1 needs us to check the visible of statistics from another database (statistic_vacuum_database) as they are after the manipulation with tables in another database, and after deleting the vestat table . In the latter case, we need to be sure that all the table statistics are not visible to us.

So, I agree that it should be added only in the latest version of the patch, where we add vacuum statistics for databases. I fixed it.

Other places look more or less fine to me.
However, I'll maybe post some additional nit-picky comments on the
next patch version.

We are glad any feedback and review, so feel free to do it)

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Vacuum statistics

From

Ilia Evdokimov

Date:

13 August 2024, 13:18:48

On 10.8.24 22:37, Andrei Zubkov wrote:

Hi, Ilia!

Do you consider not to create new table in pg_catalog but to save 
statistics in existing table? I mean pg_class or 
pg_stat_progress_analyze, pg_stat_progress_vacuum?

Thank you for your interest on our patch!

*_progress views is not our case. They hold online statistics while
vacuum is in progress. Once work is done on a table the entry is gone
from those views. Idea of this patch is the opposite - it doesn't
provide online statistics but it accumulates statistics about rosources
consumed by all vacuum passes over all relations. It's much closer to
the pg_stat_all_tables than pg_stat_progress_vacuum.

It seems pg_class is not the right place because it is not a statistic
view - it holds the current relation state and haven't anything about
the relation workload.

Maybe the pg_stat_all_tables is the right place but I have several
thoughts about why it is not:
- Some statistics provided by this patch is really vacuum specific. I
don't think we want them in the relation statistics view.
- Postgres is extreamly extensible. I'm sure someday there will be
table AMs that does not need the vacuum at all.

Right now vacuum specific workload views seems optimal choice to me.

Regards,

Agreed. They are not god places to store such statistics.

I have some suggestions:

pgstatfuncs.c in functions tuplestore_put_for_database() and tuplestore_put_for_relation you can remove 'nulls' array if you're sure that columns cannot be NULL.
These functions are almost the same and I would think of writing one function depending of type 'ExtVacReportType'

Re: Vacuum statistics

From

Ilia Evdokimov

Date:

13 August 2024, 13:37:41

And I have one suggestion for pg_stat_vacuum_database: I suppose we 
should add database's name column after 'dboid' column because it is 
difficult to read statistics without database's name. We could call it 
'datname' just like in 'pg_stat_database' view.

Regards,

Ilia Evdokimov,
Tantor Labs LCC.

Re: Vacuum statistics

From

Ilia Evdokimov

Date:

15 August 2024, 12:50:58

On 15.8.24 11:49, Alena Rybakina wrote:

I have some suggestions:
pgstatfuncs.c in functions tuplestore_put_for_database() and tuplestore_put_for_relation you can remove 'nulls' array if you're sure that columns cannot be NULL.
We need to use this for tuplestore_putvalues function. With this function, we fill the table with the values of the statistics.

Ah, right! I'm sorry.

These functions are almost the same and I would think of writing one function depending of type 'ExtVacReportType'
I'm not sure that I fully understand what you mean. Can you explain it more clearly, please?

Ah, I didn't notice that the size of all three tables is different. Therefore, it won't be possible to write one function instead of two to avoid code duplication. My mistake.

Re: Vacuum statistics

From

jian he

Date:

19 August 2024, 12:32:48

in pg_stats_vacuum
    if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP)
    {
        Oid                    relid = PG_GETARG_OID(1);

        /* Load table statistics for specified database. */
        if (OidIsValid(relid))
        {
            tabentry = fetch_dbstat_tabentry(dbid, relid);
            if (tabentry == NULL || tabentry->vacuum_ext.type != type)
                /* Table don't exists or isn't an heap relation. */
                PG_RETURN_NULL();

            tuplestore_put_for_relation(relid, rsinfo, tabentry);
        }
        else
        {
       }


So for functions pg_stat_vacuum_indexes and pg_stat_vacuum_tables,
it seems you didn't check "relid" 's relkind,
you may need to use get_rel_relkind.

Re: Vacuum statistics

From

Ilia Evdokimov

Date:

19 August 2024, 19:28:00

Are you certain that all tables are included in `pg_stat_vacuum_tables`? 
I'm asking because of the following:


SELECT count(*) FROM pg_stat_all_tables ;
  count
-------
    108
(1 row)

SELECT count(*) FROM pg_stat_vacuum_tables ;
  count
-------
     20
(1 row)

-- 
Regards,
Ilia Evdokimov,
Tantor Labs LCC.

Re: Vacuum statistics

From

Alena Rybakina

Date:

21 August 2024, 01:35:00

Hi! Thank you very much for your review! Sorry for my late response I was overwhelmed by tasks.

On 16.08.2024 14:12, jian he wrote:

On Thu, Aug 15, 2024 at 4:49 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:

Hi!


I've applied all the v5 patches.
0002 and 0003 have white space errors.

+      <para>
+        Number of times blocks of this index were already found
+        in the buffer cache by vacuum operations, so that a read was
not necessary
+        (this only includes hits in the
+        &project; buffer cache, not the operating system's file system cache)
+      </para></entry>

+        Number of times blocks of this table were already found
+        in the buffer cache by vacuum operations, so that a read was
not necessary
+        (this only includes hits in the
+        &project; buffer cache, not the operating system's file system cache)
+      </para></entry>

"&project;"
represents a sgml file placeholder name as "project" and puts all the
content of "project.sgml" to system-views.sgml.
but you don't have "project.sgml". you may check
doc/src/sgml/filelist.sgml or doc/src/sgml/ref/allfiles.sgml
for usage of "&place_holder;".
so you can change it to "project", otherwise doc cannot build.


src/backend/commands/dbcommands.c
we have:    /*     * If built with appropriate switch, whine when regression-testing     * conventions for database names are violated.  But don't complain during     * initdb.     */
#ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS    if (IsUnderPostmaster && strstr(dbname, "regression") == NULL)        elog(WARNING, "databases created by regression test cases
should have names including \"regression\"");
#endif
so in src/test/regress/sql/vacuum_tables_and_db_statistics.sql you
need to change dbname:
CREATE DATABASE statistic_vacuum_database;
CREATE DATABASE statistic_vacuum_database1;


+  <para>
+   The view <structname>pg_stat_vacuum_indexes</structname> will contain
+   one row for each index in the current database (including TOAST
+   table indexes), showing statistics about vacuuming that specific index.
+  </para>
TOAST should
<acronym>TOAST</acronym>



+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
maybe change to            ereport(ERROR,                    (errcode(ERRCODE_DATATYPE_MISMATCH),                     errmsg("return type must be a row type")));
Later I found out "InitMaterializedSRF(fcinfo, 0);" already did all
the work. much of the code can be gotten rid of.
please check attached.

I agree with your suggestions for improving the code. I will add this in the next version of the patch.

#define EXTVACHEAPSTAT_COLUMNS    27
#define EXTVACIDXSTAT_COLUMNS    19
#define EXTVACDBSTAT_COLUMNS    15
#define EXTVACSTAT_COLUMNS Max(EXTVACHEAPSTAT_COLUMNS, EXTVACIDXSTAT_COLUMNS)

static Oid CurrentDatabaseId = InvalidOid;

we already defined MyDatabaseId in src/include/miscadmin.h,
Why do we need "static Oid CurrentDatabaseId = InvalidOid;"?
also src/backend/utils/adt/pgstatfuncs.c already included "miscadmin.h".

Hmm, Tom Lane added "misc admin.h", or I didn't notice something. Could you point this out, please?

We used the Current Database Id to output statistics on tables from another database, so we need to replace it with a different default value. But I want to rewrite this patch to display table statistics only for the current database, that is, this part will be removed in the future. In my opinion, it would be more correct.

the following code one function has 2 return statements?

------------------------------------------------------------------------
/* * Get the vacuum statistics for the heap tables. */
Datum
pg_stat_vacuum_tables(PG_FUNCTION_ARGS)
{    return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_HEAP, EXTVACHEAPSTAT_COLUMNS);
    PG_RETURN_NULL();
}

/* * Get the vacuum statistics for the indexes. */
Datum
pg_stat_vacuum_indexes(PG_FUNCTION_ARGS)
{    return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_INDEX, EXTVACIDXSTAT_COLUMNS);
    PG_RETURN_NULL();
}

/* * Get the vacuum statistics for the database. */
Datum
pg_stat_vacuum_database(PG_FUNCTION_ARGS)
{        return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_DB, EXTVACDBSTAT_COLUMNS);
    PG_RETURN_NULL();
}

You are right - the second return is superfluous. I'll fix it.

------------------------------------------------------------------------
in pg_stats_vacuum:    if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP)    {        Oid                    relid = PG_GETARG_OID(1);
        /* Load table statistics for specified database. */        if (OidIsValid(relid))        {            tabentry = fetch_dbstat_tabentry(dbid, relid);            if (tabentry == NULL || tabentry->vacuum_ext.type != type)                /* Table don't exists or isn't an heap relation. */                PG_RETURN_NULL();
            tuplestore_put_for_relation(relid, tupstore, tupdesc,
tabentry, ncolumns);        }        else        {         ...        }
}
I don't understand the ELSE branch. the IF branch means the input
dboid, heap/index oid is correct.
the ELSE branch means table reloid is invalid = 0.
I am not sure 100% what the ELSE Branch means.
Also there are no comments explaining why.
experiments seem to show that  when reloid is 0, it will print out all
the vacuum statistics
for all the tables in the current database. If so, then only super
users can call pg_stats_vacuum?
but the table owner should be able to call pg_stats_vacuum for that
specific table.

If any reloid has not been set by the user, we output statistics for all objects - tables or indexes.In this part of the code, we find all the suitable objects from the snapshot, if they belong to the index or table type of objects.

/* Type of ExtVacReport */
typedef enum ExtVacReportType
{    PGSTAT_EXTVAC_INVALID = 0,    PGSTAT_EXTVAC_HEAP = 1,    PGSTAT_EXTVAC_INDEX = 2,    PGSTAT_EXTVAC_DB = 3,
} ExtVacReportType;
generally "HEAP" means table and index, maybe "PGSTAT_EXTVAC_HEAP" would be term

No, Heap means something like a table in a relationship database, or its alternative name is Heap.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Vacuum statistics

From

Alena Rybakina

Date:

21 August 2024, 01:37:16

We check it there: "tabentry->vacuum_ext.type != type". Or were you talking about something else?

On 19.08.2024 12:32, jian he wrote:

in pg_stats_vacuum    if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP)    {        Oid                    relid = PG_GETARG_OID(1);
        /* Load table statistics for specified database. */        if (OidIsValid(relid))        {            tabentry = fetch_dbstat_tabentry(dbid, relid);            if (tabentry == NULL || tabentry->vacuum_ext.type != type)                /* Table don't exists or isn't an heap relation. */                PG_RETURN_NULL();
            tuplestore_put_for_relation(relid, rsinfo, tabentry);        }        else        {       }


So for functions pg_stat_vacuum_indexes and pg_stat_vacuum_tables,
it seems you didn't check "relid" 's relkind,
you may need to use get_rel_relkind.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Vacuum statistics

From

Alena Rybakina

Date:

21 August 2024, 01:39:15

I think you've counted the above system tables from the database, but 
I'll double-check it. Thank you for your review!

On 19.08.2024 19:28, Ilia Evdokimov wrote:
> Are you certain that all tables are included in 
> `pg_stat_vacuum_tables`? I'm asking because of the following:
>
>
> SELECT count(*) FROM pg_stat_all_tables ;
>  count
> -------
>    108
> (1 row)
>
> SELECT count(*) FROM pg_stat_vacuum_tables ;
>  count
> -------
>     20
> (1 row)
>
-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Vacuum statistics

From

Kirill Reshke

Date:

22 August 2024, 07:29:58

On Thu, 22 Aug 2024 at 07:48, jian he <jian.universality@gmail.com> wrote:
>
> On Wed, Aug 21, 2024 at 6:37 AM Alena Rybakina
> <a.rybakina@postgrespro.ru> wrote:
> >
> > We check it there: "tabentry->vacuum_ext.type != type". Or were you talking about something else?
> >
> > On 19.08.2024 12:32, jian he wrote:
> >
> > in pg_stats_vacuum
> >     if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP)
> >     {
> >         Oid                    relid = PG_GETARG_OID(1);
> >
> >         /* Load table statistics for specified database. */
> >         if (OidIsValid(relid))
> >         {
> >             tabentry = fetch_dbstat_tabentry(dbid, relid);
> >             if (tabentry == NULL || tabentry->vacuum_ext.type != type)
> >                 /* Table don't exists or isn't an heap relation. */
> >                 PG_RETURN_NULL();
> >
> >             tuplestore_put_for_relation(relid, rsinfo, tabentry);
> >         }
> >         else
> >         {
> >        }
> >
> >
> > So for functions pg_stat_vacuum_indexes and pg_stat_vacuum_tables,
> > it seems you didn't check "relid" 's relkind,
> > you may need to use get_rel_relkind.
> >
> > --
>
> hi.
> I mentioned some points at [1],
> Please check the attached patchset to address these issues.
>
> there are four occurrences of "CurrentDatabaseId", i am still confused
> with usage of CurrentDatabaseId.
>
> also please don't  top-post, otherwise the archive, like [2] is not
> easier to read for future readers.
> generally you quote first, then reply.
>
> [1] https://postgr.es/m/CACJufxHb_YGCp=pVH6DZcpk9yML+SueffPeaRbX2LzXZVahd_w@mail.gmail.com
> [2] https://postgr.es/m/78394e29-a900-4af4-b5ce-d6eb2d263fad@postgrespro.ru

Hi, your points are valid.
Regarding 0003, I also wanted to object database naming in a
regression test during my review but for some reason didn't.Now, as
soon as we already need to change it, I suggest we also change
regression_statistic_vacuum_db1 to something less generic. Maybe
regression_statistic_vacuum_db_unaffected.



--
Best regards,
Kirill Reshke

Re: Vacuum statistics

From

Alexander Korotkov

Date:

23 August 2024, 04:07:37

On Wed, Aug 21, 2024 at 1:39 AM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>
> I think you've counted the above system tables from the database, but
> I'll double-check it. Thank you for your review!
>
> On 19.08.2024 19:28, Ilia Evdokimov wrote:
> > Are you certain that all tables are included in
> > `pg_stat_vacuum_tables`? I'm asking because of the following:
> >
> >
> > SELECT count(*) FROM pg_stat_all_tables ;
> > count
> > -------
> > 108
> > (1 row)
> >
> > SELECT count(*) FROM pg_stat_vacuum_tables ;
> > count
> > -------
> > 20
> > (1 row)
> >

I'd like to do some review a well.

+ MyDatabaseId = dbid;
+
+ PG_TRY();
+ {
+ tabentry = pgstat_fetch_stat_tabentry(relid);
+ MyDatabaseId = storedMyDatabaseId;
+ }
+ PG_CATCH();
+ {
+ MyDatabaseId = storedMyDatabaseId;
+ }
+ PG_END_TRY();

I think this is generally wrong to change MyDatabaseId, especially if you have to wrap it with PG_TRY()/PG_CATCH(). I think, instead we need proper API changes, i.e. make pgstat_fetch_stat_tabentry() and others take dboid as an argument.

+/*
+ * Get the vacuum statistics for the heap tables.
+ */
+Datum
+pg_stat_vacuum_tables(PG_FUNCTION_ARGS)
+{
+ return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_HEAP, EXTVACHEAPSTAT_COLUMNS);
+
+ PG_RETURN_NULL();
+}

The PG_RETURN_NULL() is unneeded after another return statement. However, does pg_stats_vacuum() need to return anything? What about making its return type void?

@@ -874,4 +874,38 @@ pgstat_get_custom_snapshot_data(PgStat_Kind kind)
return pgStatLocal.snapshot.custom_data[idx];
}

+/* hash table for statistics snapshots entry */
+typedef struct PgStat_SnapshotEntry
+{
+ PgStat_HashKey key;
+ char status; /* for simplehash use */
+ void *data; /* the stats data itself */
+} PgStat_SnapshotEntry;

It would be nice to preserve encapsulation and don't expose pgstat_snapshot hash in the headers. I see there is only one usage of it outside of pgstat.c: pg_stats_vacuum().

+ Oid storedMyDatabaseId = MyDatabaseId;
+
+ pgstat_update_snapshot(PGSTAT_KIND_RELATION);
+ MyDatabaseId = storedMyDatabaseId;

This manipulation with storedMyDatabaseId looks pretty useless. It seems to be intended to change MyDatabaseId, while I'm not fan of this as I mentioned above.

+static PgStat_StatTabEntry *
+fetch_dbstat_tabentry(Oid dbid, Oid relid)
+{
+ Oid storedMyDatabaseId = MyDatabaseId;
+ PgStat_StatTabEntry *tabentry = NULL;
+
+ if (OidIsValid(CurrentDatabaseId) && CurrentDatabaseId == dbid)
+ /* Quick path when we read data from the same database */
+ return pgstat_fetch_stat_tabentry(relid);
+
+ pgstat_clear_snapshot();

It looks scary to reset the whole snapshot each time we access another database. Need to also mention that the CurrentDatabaseId machinery isn't implemented.

New functions pg_stat_vacuum_tables(), pg_stat_vacuum_indexes(), pg_stat_vacuum_database() are SRFs. When zero Oid is passed they report all the objects. However, it seems they aren't intended to be used directly. Instead, there are views with the same names. These views always call them with particular Oids, therefore SRFs always return one row. Then why bother with SRF? They could return plain records instead.

Also, as I mentioned above patchset makes a lot of trouble accessing statistics of relations of another database. But that seems to be useless given corresponding views allow to see only relations of the current database. Even if you call functions directly, what is the value of this information given that you don't know the relation oids in another database? So, I think if we will give up and limit access to the relations of the current database patch will become simpler and clearer.

------
Regards,
Alexander Korotkov
Supabase

Re: Vacuum statistics

From

Alena Rybakina

Date:

25 August 2024, 19:06:51

On 22.08.2024 05:47, jian he wrote:

On Wed, Aug 21, 2024 at 6:37 AM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:

We check it there: "tabentry->vacuum_ext.type != type". Or were you talking about something else?

On 19.08.2024 12:32, jian he wrote:

in pg_stats_vacuum    if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP)    {        Oid                    relid = PG_GETARG_OID(1);
        /* Load table statistics for specified database. */        if (OidIsValid(relid))        {            tabentry = fetch_dbstat_tabentry(dbid, relid);            if (tabentry == NULL || tabentry->vacuum_ext.type != type)                /* Table don't exists or isn't an heap relation. */                PG_RETURN_NULL();
            tuplestore_put_for_relation(relid, rsinfo, tabentry);        }        else        {       }


So for functions pg_stat_vacuum_indexes and pg_stat_vacuum_tables,
it seems you didn't check "relid" 's relkind,
you may need to use get_rel_relkind.

--

hi.
I mentioned some points at [1],
Please check the attached patchset to address these issues.

Thank you for your work! I checked the patches and added your suggested changes to the new version of the patch here [0]. In my opinion, nothing was missing, but please take a look.

[0] https://www.postgresql.org/message-id/c4e4e305-7119-4183-b49a-d7092f4efba3%40postgrespro.ru


there are four occurrences of "CurrentDatabaseId", i am still confused
with usage of CurrentDatabaseId.

It needed to be used because of scanning objects from the other database, so we change the id of dbid temporary to achieve it.

You should snow that every part of this code was deleted. Now we can check information about tables and indexes from the current database.

also please don't  top-post, otherwise the archive, like [2] is not
easier to read for future readers.
generally you quote first, then reply.

[1] https://postgr.es/m/CACJufxHb_YGCp=pVH6DZcpk9yML+SueffPeaRbX2LzXZVahd_w@mail.gmail.com
[2] https://postgr.es/m/78394e29-a900-4af4-b5ce-d6eb2d263fad@postgrespro.ru

Ok, no problem.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Vacuum statistics

From

Alena Rybakina

Date:

25 August 2024, 19:12:40

On 22.08.2024 07:29, Kirill Reshke wrote:

On Thu, 22 Aug 2024 at 07:48, jian he <jian.universality@gmail.com> wrote:

On Wed, Aug 21, 2024 at 6:37 AM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:

We check it there: "tabentry->vacuum_ext.type != type". Or were you talking about something else?

On 19.08.2024 12:32, jian he wrote:

in pg_stats_vacuum    if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP)    {        Oid                    relid = PG_GETARG_OID(1);
        /* Load table statistics for specified database. */        if (OidIsValid(relid))        {            tabentry = fetch_dbstat_tabentry(dbid, relid);            if (tabentry == NULL || tabentry->vacuum_ext.type != type)                /* Table don't exists or isn't an heap relation. */                PG_RETURN_NULL();
            tuplestore_put_for_relation(relid, rsinfo, tabentry);        }        else        {       }


So for functions pg_stat_vacuum_indexes and pg_stat_vacuum_tables,
it seems you didn't check "relid" 's relkind,
you may need to use get_rel_relkind.

--

hi.
I mentioned some points at [1],
Please check the attached patchset to address these issues.

there are four occurrences of "CurrentDatabaseId", i am still confused
with usage of CurrentDatabaseId.

also please don't  top-post, otherwise the archive, like [2] is not
easier to read for future readers.
generally you quote first, then reply.

[1] https://postgr.es/m/CACJufxHb_YGCp=pVH6DZcpk9yML+SueffPeaRbX2LzXZVahd_w@mail.gmail.com
[2] https://postgr.es/m/78394e29-a900-4af4-b5ce-d6eb2d263fad@postgrespro.ru

Hi, your points are valid.
Regarding 0003, I also wanted to object database naming in a
regression test during my review but for some reason didn't.Now, as
soon as we already need to change it, I suggest we also change
regression_statistic_vacuum_db1 to something less generic. Maybe
regression_statistic_vacuum_db_unaffected.

Hi! I fixed it in the new version of the patch [0]. Feel free to review it!

To be honest, I still doubt that the current database names (regression_statistic_vacuum_db and regression_statistic_vacuum_db1) can be easily generated, but if you insist on renaming, I will do it.

[0] https://www.postgresql.org/message-id/c4e4e305-7119-4183-b49a-d7092f4efba3%40postgrespro.ru

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Vacuum statistics

From

jian he

Date:

05 September 2024, 15:47:00

On Thu, Sep 5, 2024 at 1:23 AM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>
> Hi, all!
>
> I have attached the new version of the code and the diff files
> (minor-vacuum.no-cbot).
>

hi.

still have white space issue when using "git apply",
you may need to use "git diff --check" to find out where.


 /* ----------
diff --git a/src/test/regress/expected/opr_sanity.out
b/src/test/regress/expected/opr_sanity.out
index 5d72b970b03..7026de157e4 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -32,11 +32,12 @@ WHERE p1.prolang = 0 OR p1.prorettype = 0 OR
        prokind NOT IN ('f', 'a', 'w', 'p') OR
        provolatile NOT IN ('i', 's', 'v') OR
        proparallel NOT IN ('s', 'r', 'u');
- oid  |        proname
-------+------------------------
+ oid  |         proname
+------+-------------------------
  8001 | pg_stat_vacuum_tables
  8002 | pg_stat_vacuum_indexes
-(2 rows)
+ 8003 | pg_stat_vacuum_database
+(3 rows)


looking at src/test/regress/sql/opr_sanity.sql:

-- **************** pg_proc ****************
-- Look for illegal values in pg_proc fields.

SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE p1.prolang = 0 OR p1.prorettype = 0 OR
       p1.pronargs < 0 OR
       p1.pronargdefaults < 0 OR
       p1.pronargdefaults > p1.pronargs OR
       array_lower(p1.proargtypes, 1) != 0 OR
       array_upper(p1.proargtypes, 1) != p1.pronargs-1 OR
       0::oid = ANY (p1.proargtypes) OR
       procost <= 0 OR
       CASE WHEN proretset THEN prorows <= 0 ELSE prorows != 0 END OR
       prokind NOT IN ('f', 'a', 'w', 'p') OR
       provolatile NOT IN ('i', 's', 'v') OR
       proparallel NOT IN ('s', 'r', 'u');

that means
 oid  |         proname
------+-------------------------
 8001 | pg_stat_vacuum_tables
 8002 | pg_stat_vacuum_indexes
 8003 | pg_stat_vacuum_database


These above functions, pg_proc.prorows should > 0 when
pg_proc.proretset is true.
I think that's the opr_sanity test's intention.
so you may need to change pg_proc.dat.

BTW the doc says:
prorows float4, Estimated number of result rows (zero if not proretset)



segmentation fault cases:
select * from pg_stat_vacuum_indexes(0);
select * from pg_stat_vacuum_tables(0);


+ else if (type == PGSTAT_EXTVAC_DB)
+ {
+ PgStatShared_Database   *dbentry;
+ PgStat_EntryRef   *entry_ref;
+ Oid dbid = PG_GETARG_OID(0);
+
+ if (OidIsValid(dbid))
+ {
+ entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE,
+ dbid, InvalidOid, false);
+ dbentry = (PgStatShared_Database *) entry_ref->shared_stats;
+
+ if (dbentry == NULL)
+ /* Table doesn't exist or isn't a heap relation */
+ return;
+
+ tuplestore_put_for_database(dbid, rsinfo, dbentry);
+ pgstat_unlock_entry(entry_ref);
+ }
+ }
didn't error out when dbid is invalid?



pg_stat_vacuum_tables
pg_stat_vacuum_indexes
pg_stat_vacuum_database
these functions didn't verify the only input argument oid's kind.
for example:

create table s(a int primary key) with (autovacuum_enabled = off);
create view sv as select * from s;
vacuum s;
select * from pg_stat_vacuum_tables('sv'::regclass::oid);
select * from pg_stat_vacuum_indexes('sv'::regclass::oid);
select * from pg_stat_vacuum_database('sv'::regclass::oid);

above all these 3 examples should error out? because  sv is view.

in src/backend/catalog/system_views.sql
for view creation of pg_stat_vacuum_indexes
you can change to

WHERE
  db.datname = current_database() AND
  rel.oid = stats.relid AND
  ns.oid = rel.relnamespace
AND rel.relkind = 'i':



pg_stat_vacuum_tables  in in src/backend/catalog/system_views.sql
you can change to

WHERE
  db.datname = current_database() AND
  rel.oid = stats.relid AND
  ns.oid = rel.relnamespace
AND rel.relkind = 'r':

Re: Vacuum statistics

From

Masahiko Sawada

Date:

27 September 2024, 21:15:39

Hi,

On Thu, Sep 5, 2024 at 2:01 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>
> Hi! Thank you for your review!
>
> On 05.09.2024 15:47, jian he wrote:
>
> On Thu, Sep 5, 2024 at 1:23 AM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>
> Hi, all!
>
> I have attached the new version of the code and the diff files
> (minor-vacuum.no-cbot).

Thank you for updating the patches. I've reviewed the 0001 patch and
have two comments.

I think we can split the 0001 patch into two parts: adding
pg_stat_vacuum_tables system views that shows the vacuum statistics
that we are currently collecting such as scanned_pages and
removed_pages, and another one is to add new statistics to collect
such as vacrel->set_all_visible_pages and visibility map updates.

I'm concerned that a pg_stat_vacuum_tables view has some duplicated
statistics that we already collect in different ways. For instance,
total_blks_{read,hit,dirtied,written} are already tracked at
system-level by pg_stat_io, and per-relation block I/O statistics can
be collected using pg_stat_statements. Having duplicated statistics
consumes more memory for pgstat and could confuse users if these
statistics are not consistent. I think it would be better to avoid
collecting duplicated statistics in different places.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Vacuum statistics

From

Melanie Plageman

Date:

27 September 2024, 22:19:31

On Fri, Sep 27, 2024 at 2:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> On Thu, Sep 5, 2024 at 2:01 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
> >
> > Hi! Thank you for your review!
> >
> > On 05.09.2024 15:47, jian he wrote:
> >
> > On Thu, Sep 5, 2024 at 1:23 AM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
> >
> > Hi, all!
> >
> > I have attached the new version of the code and the diff files
> > (minor-vacuum.no-cbot).
>
> Thank you for updating the patches. I've reviewed the 0001 patch and
> have two comments.

I took a very brief look at this and was wondering if it was worth
having a way to make the per-table vacuum statistics opt-in (like a
table storage parameter) in order to decrease the shared memory
footprint of storing the stats.

- Melanie

Re: Vacuum statistics

From

Andrei Zubkov

Date:

27 September 2024, 22:25:14

Hi,

On Fri, 2024-09-27 at 11:15 -0700, Masahiko Sawada wrote:
> I'm concerned that a pg_stat_vacuum_tables view has some duplicated
> statistics that we already collect in different ways. For instance,
> total_blks_{read,hit,dirtied,written} are already tracked at
> system-level by pg_stat_io,

pg_stat_vacuum_tables.total_blks_{read,hit,dirtied,written} tracks
blocks used by vacuum in different ways while vacuuming this particular
table while pg_stat_io tracks blocks used by vacuum on the cluster
level.

> and per-relation block I/O statistics can
> be collected using pg_stat_statements.

This is impossible. pg_stat_statements tracks block statistics on a
statement level. One statement could touch many tables and many
indexes, and all used database blocks will be counted by the
pg_stat_statements counters on a statement-level. Autovacuum statistics
won't be accounted by the pg_stat_statements. After all,
pg_stat_statements won't hold the statements statistics forever. Under
pressure of new statements the statement eviction can happen and
statistics will be lost.

All of the above is addressed by relation-level vacuum statistics held
in the Cumulative Statistics System proposed by this patch.
--
regards, Andrei Zubkov
Postgres Professional

Re: Vacuum statistics

From

Masahiko Sawada

Date:

27 September 2024, 23:13:15

On Fri, Sep 27, 2024 at 12:19 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Fri, Sep 27, 2024 at 2:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Hi,
> >
> > On Thu, Sep 5, 2024 at 2:01 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
> > >
> > > Hi! Thank you for your review!
> > >
> > > On 05.09.2024 15:47, jian he wrote:
> > >
> > > On Thu, Sep 5, 2024 at 1:23 AM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
> > >
> > > Hi, all!
> > >
> > > I have attached the new version of the code and the diff files
> > > (minor-vacuum.no-cbot).
> >
> > Thank you for updating the patches. I've reviewed the 0001 patch and
> > have two comments.
>
> I took a very brief look at this and was wondering if it was worth
> having a way to make the per-table vacuum statistics opt-in (like a
> table storage parameter) in order to decrease the shared memory
> footprint of storing the stats.

I'm not sure how users can select tables that enable vacuum statistics
as I think they basically want to have statistics for all tables, but
I see your point. Since the size of PgStat_TableCounts approximately
tripled by this patch (112 bytes to 320 bytes), it might be worth
considering ways to reduce the number of entries or reducing the size
of vacuum statistics.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Vacuum statistics

From

Alena Rybakina

Date:

29 September 2024, 00:22:28

Hi! Thank you for your interesting for this patch!

I took a very brief look at this and was wondering if it was worth
having a way to make the per-table vacuum statistics opt-in (like a
table storage parameter) in order to decrease the shared memory
footprint of storing the stats.

I'm not sure how users can select tables that enable vacuum statistics
as I think they basically want to have statistics for all tables, but
I see your point. Since the size of PgStat_TableCounts approximately
tripled by this patch (112 bytes to 320 bytes), it might be worth
considering ways to reduce the number of entries or reducing the size
of vacuum statistics.

The main purpose of these statistics is to see abnormal behavior of vacuum in relation to a table or the database as a whole.

For example, there may be a situation where vacuum has started to run more often and spends a lot of resources on processing a certain index, but the size of the index does not change significantly. Moreover, the table in which this index is located can be much smaller in size. This may be because the index is bloated and needs to be reindexed.

This is exactly what vacuum statistics can show - we will see that compared to other objects, vacuum processed more blocks and spent more time on this index.

Perhaps the vacuum parameters for the index should be set more aggressively to avoid this in the future.

I suppose that if we turn off statistics collection for a certain object, we can miss it. In addition, the user may not enable the parameter for the object in time, because he will forget about it.

As for the second option, now I cannot say which statistics can be removed, to be honest. So far, they all seem necessary.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Ilia Evdokimov

Date:

16 October 2024, 13:31:43

On 08.10.2024 19:18, Alena Rybakina wrote:

Made a rebase on a fresh master branch.

-- 
Regards,
Alena Rybakina
Postgres Professional

Thank you for rebasing.

I have noticed that when I create a table or an index on this table, there is no information about the table or index in pg_stat_vacuum_tables and pg_stat_vacuum_indexes until we perform a VACUUM.

Example:

CREATE TABLE t (i INT, j INT);
INSERT INTO t SELECT i/10, i/100 FROM GENERATE_SERIES(1,1000000) i;
SELECT * FROM pg_stat_vacuum_tables WHERE relname = 't';
....
(0 rows)
CREATE INDEX ON t (i);
SELECT * FROM pg_stat_vacuum_indexes WHERE relname = 't_i_idx';
...
(0 rows)

I can see the entries after running VACUUM or executing autovacuum. or when autovacuum is executed. I would suggest adding a line about the relation even if it has not yet been processed by vacuum. Interestingly, this issue does not occur with pg_stat_vacuum_database:

CREATE DATABASE example_db;
SELECT * FROM pg_stat_vacuum_database WHERE dbname = 'example_db';
dboid | dbname | ...
... | example_db | ...
(1 row)

BTW, I recommend renaming the view pg_stat_vacuum_database to pg_stat_vacuum_databaseS for consistency with pg_stat_vacuum_tables and pg_stat_vacuum_indexes

--
Regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Vacuum statistics

From

Alena Rybakina

Date:

16 October 2024, 14:01:55

Hi!

On 16.10.2024 13:31, Ilia Evdokimov wrote:

On 08.10.2024 19:18, Alena Rybakina wrote:
Made a rebase on a fresh master branch.
-- 
Regards,
Alena Rybakina
Postgres Professional
Thank you for rebasing.
I have noticed that when I create a table or an index on this table, there is no information about the table or index in pg_stat_vacuum_tables and pg_stat_vacuum_indexes until we perform a VACUUM.
Example:
CREATE TABLE t (i INT, j INT);
INSERT INTO t SELECT i/10, i/100 FROM GENERATE_SERIES(1,1000000) i;
SELECT * FROM pg_stat_vacuum_tables WHERE relname = 't';
....
(0 rows)
CREATE INDEX ON t (i);
SELECT * FROM pg_stat_vacuum_indexes WHERE relname = 't_i_idx';
...
(0 rows)
I can see the entries after running VACUUM or executing autovacuum. or when autovacuum is executed. I would suggest adding a line about the relation even if it has not yet been processed by vacuum. Interestingly, this issue does not occur with pg_stat_vacuum_database:
CREATE DATABASE example_db;
SELECT * FROM pg_stat_vacuum_database WHERE dbname = 'example_db';
dboid | dbname | ...
... | example_db | ...
(1 row)

BTW, I recommend renaming the view pg_stat_vacuum_database to pg_stat_vacuum_databaseS for consistency with pg_stat_vacuum_tables and pg_stat_vacuum_indexes

Thanks for the review. I'm investigating this. I agree with the renaming, I will do it in the next version of the patch.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Andrei Zubkov

Date:

16 October 2024, 14:17:38

Hi Ilia,

On Wed, 2024-10-16 at 13:31 +0300, Ilia Evdokimov wrote:
> BTW, I recommend renaming the view pg_stat_vacuum_database to
> pg_stat_vacuum_databaseS  for consistency with pg_stat_vacuum_tables
> and pg_stat_vacuum_indexes

Such renaming doesn't seems correct to me because
pg_stat_vacuum_database is consistent with pg_stat_database view, while
pg_stat_vacuum_tables is consistent with pg_stat_all_tables.

This inconsistency is in Postgres views, so it should be changed
synchronously.
--
regards, Andrei Zubkov

Re: Vacuum statistics

From

Alexander Korotkov

Date:

28 October 2024, 16:40:04

On Sun, Aug 25, 2024 at 6:59 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
> I didn't understand correctly - did you mean that we don't need SRF if
> we need to display statistics for a specific object?
>
> Otherwise, we need this when we display information on all database
> objects (tables or indexes):
>
> while ((entry = ScanStatSnapshot(pgStatLocal.snapshot.stats, &hashiter))
> != NULL)
> {
>      CHECK_FOR_INTERRUPTS();
>
>      tabentry = (PgStat_StatTabEntry *) entry->data;
>
>      if (tabentry != NULL && tabentry->vacuum_ext.type == type)
>          tuplestore_put_for_relation(relid, rsinfo, tabentry);
> }
>
> I know we can construct a HeapTuple object containing a TupleDesc,
> values, and nulls for a particular object, but I'm not sure we can
> augment it while looping through multiple objects.
>
> /* Initialise attributes information in the tuple descriptor */
>
>   tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_SUBSCRIPTION_STATS_COLS);
>
> ...
>
> PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
>
>
> If I missed something or misunderstood, can you explain in more detail?

Actually, I mean why do we need a possibility to return statistics for
all tables/indexes in one function call?  User anyway is supposed to
use pg_stat_vacuum_indexes/pg_stat_vacuum_tables view, which do
function calls one per relation.  I suppose we can get rid of
possibility to get all the objects in one function call and just
return a tuple from the functions like other pgstatfuncs.c functions
do.

------
Regards,
Alexander Korotkov
Supabase

Re: Vacuum statistics

From

Alexander Korotkov

Date:

28 October 2024, 22:07:02

On Sun, Sep 29, 2024 at 12:22 AM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
> Hi! Thank you for your interesting for this patch!
>
> I took a very brief look at this and was wondering if it was worth
> having a way to make the per-table vacuum statistics opt-in (like a
> table storage parameter) in order to decrease the shared memory
> footprint of storing the stats.
>
> I'm not sure how users can select tables that enable vacuum statistics
> as I think they basically want to have statistics for all tables, but
> I see your point. Since the size of PgStat_TableCounts approximately
> tripled by this patch (112 bytes to 320 bytes), it might be worth
> considering ways to reduce the number of entries or reducing the size
> of vacuum statistics.
>
> The main purpose of these statistics is to see abnormal behavior of vacuum in relation to a table or the database as
awhole. 
>
> For example, there may be a situation where vacuum has started to run more often and spends a lot of resources on
processinga certain index, but the size of the index does not change significantly. Moreover, the table in which this
indexis located can be much smaller in size. This may be because the index is bloated and needs to be reindexed. 
>
> This is exactly what vacuum statistics can show - we will see that compared to other objects, vacuum processed more
blocksand spent more time on this index. 
>
> Perhaps the vacuum parameters for the index should be set more aggressively to avoid this in the future.
>
> I suppose that if we turn off statistics collection for a certain object, we can miss it. In addition, the user may
notenable the parameter for the object in time, because he will forget about it. 

I agree with this point.  Additionally, in order to benefit from
gatherting vacuum statistics only for some relations in terms of
space, we need to handle variable-size stat entries.  That would
greatly increase the complexity.

> As for the second option, now I cannot say which statistics can be removed, to be honest. So far, they all seem
necessary.

Yes, but as Masahiko-san pointed out, PgStat_TableCounts is almost
tripled in space.  That a huge change from having no statistics on
vacuum to have it in much more detail than everything else we
currently have.  I think the feasible way might be to introduce some
most demanded statistics first then see how it goes.

------
Regards,
Alexander Korotkov
Supabase

Re: Vacuum statistics

From

Jim Nasby

Date:

29 October 2024, 00:03:30

On Oct 28, 2024, at 2:07 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote:

I suppose that if we turn off statistics collection for a certain object, we can miss it. In addition, the user may not enable the parameter for the object in time, because he will forget about it.

I agree with this point. Additionally, in order to benefit from
gatherting vacuum statistics only for some relations in terms of
space, we need to handle variable-size stat entries. That would
greatly increase the complexity.

Could vacuum stats be treated as a separate category instead of adding it to PgStat_TableCounts?

As for the second option, now I cannot say which statistics can be removed, to be honest. So far, they all seem necessary.

Yes, but as Masahiko-san pointed out, PgStat_TableCounts is almost
tripled in space. That a huge change from having no statistics on
vacuum to have it in much more detail than everything else we
currently have. I think the feasible way might be to introduce some
most demanded statistics first then see how it goes.

Looking at the stats I do think the WAL stats are probably not helpful. First, there’s nothing users can do to tune how much WAL is generated by vacuum. Second, this introduces the risk of users saying “Wow, vacuum is creating a lot of WAL! I’m going to turn it down!”, which is most likely to make matters worse. There’s already a lot of stuff that goes into WAL without any detailed logging; if we ever wanted to provide a comprehensive view of what data is in WAL that should be handled separately.

The rest of the stats all look important. In fact, I think there’s even more stats that could be included (such as all frozen/visible pages skipped) - even more reason to look at having separate controls for tracking vacuum stats. There’s also an argument to be made for tracking autovac separately from manual vacuum. So long-term we might want to look at other ways to handle these stats, not only because of the large number of stats, but because they would be updated very infrequently compared to other stats counters. Ironically, the old stats system would probably have been more than sufficient for these stats. Tracking them in a real table might also be an option.

Is there a reason some fields are omitted from pg_stat_vacuum_database? While some stats are certainly more interesting at the per-relation level, I can’t really think of any that don’t make sense at the database level as well.

Looking at the per table/index stats, I strongly dislike the use of the term “delete” - it is a recipe for confusion with row deletion.. A much better term is “remove” or “removed”. I realize the term “delete” is used in places in vacuum logging, but IMO we should fix that as well instead of doubling-down on it.

I think “interrupts” is also a very confusing name - those fields should just be called “errors”.

I realize “relname” is being used for consistency with pg_stat_all_(tables|indexes), but I’m not sure it makes sense to double-down on that. Especially in pg_stat_vacuum_indexes, where it’s not completely clear whether relname is referring to the table or the index. I’m also inclined to say that the name of the table should be included in pg_stat_vacuum_indexes.

For all the views the docs should clarify that total_blks_written means blocks written by vacuum, as opposed to the background writer. Similarly they should clarify the difference between rel_blks_(read|hit) and total_blks_(read|hit). In the case of pg_stat_vacuum_indexes it’d be better if rel_blks_(read|hit) were called index_blks_(read|hit). Although… if total_blks_* is actually the count across the table and all the indexes I don’t know that we even need that counter. I realize that not ever vacuum even looks at the indexes, but if we’re going to go into that level of detail then we would (at minimum) need to count the number of times a vacuum completely skipped scanning the indexes.

Having rev_all_(frozen|visible)_pages in the same view as vacuum stats will confuse users into thinking that vacuum is clearing the bits. Those fields really belong in pg_stat_all_tables.

Sadly index_vacuum_count is may not useful at all at present. At minimum you’d need to know the number of times vacuum had run in total. I realize that’s in pg_stat_all_tables, but that doesn’t help if vacuum stats are tracked or reset separately. At minimum the docs should mention them. They also need to clarify if index_vacuum_count is incremented per-index or per-pass (hopefully the later). Assuming it’s per-pass, a better name for the field would be index_vacuum_passes, index_passes, index_pass_count, or similar. But even with that we still need a counter for the number of vacuums where index processing was skipped.

Other items

First, thanks to everyone that’s put work into this patch - it’s a big step forward. I certainly don’t want the perfect to be the enemy of the good, but since the size of these stats entries has already come up as a concern I want to consider use cases that would still not be covered by this patch. I’m not suggesting these need to be added now, but IMHO they’re logical next steps (that would also mean more counters). The cases below would probably mean at least doubling the number of vacuum-related counters, at least at the table level.

First, there’s still gaps in trying to track HOT; most notably a counter for how many updates would never be HOT eligible because they modify indexes. pg_stat_all_tables.n_tup_newpage_upd is really limited without that info.

There should also be stats about unused line pointers - in degenerate cases the lp array can consume a significant portion of heap storage.

Monitoring bloat would be a lot more accurate if vacuum reported total tuple length for each run along with the total number of tuples it looked at. Having that info would make it trivial to calculate average tuple size, which could then be applied to reltuples and relpages to calculate how much space would being lost to bloat.

Autovacuum will self-terminate if it would block another process (unless it’s an aggressive vacuum) - that’s definitely something that should be tracked. Not just the number of times that happens, but also stats about how much work was lost because of this.

Shrinking a relation (what vacuum calls truncation, which is very confusing with the truncate command) is a rather complex process that currently has no visibility.

Tuning vacuum_freeze_min_age (and the MXID variant) is rather complicated. We maybe have enough stats on whether it could be set lower, but there’s no visibility on how the settings affect how often vacuum decides to be aggressive. At minimum, we should have stats on when vacuum is aggressive, especially since it significantly changes the behavior of autovac.

I saw someone else already mentioned tuning vacuum memory usage, but I’ll mention it again. Even if the issues with index_vacuum_count are fixed that still only tells you if you have a problem; it doesn’t give you a great idea of how much more memory you need. The best you can do is assuming you need (number of passes - 1) * current memory.

Speaking of which… there should be stats on any time vacuum decided on it’s own to skip index processing due to wraparound proximity.

I’m sure there’s some other use cases that I’m not thinking of.

Re: Vacuum statistics

From

Alena Rybakina

Date:

29 October 2024, 14:02:13

On 28.10.2024 16:40, Alexander Korotkov wrote:

On Sun, Aug 25, 2024 at 6:59 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:

I didn't understand correctly - did you mean that we don't need SRF if
we need to display statistics for a specific object?

Otherwise, we need this when we display information on all database
objects (tables or indexes):

while ((entry = ScanStatSnapshot(pgStatLocal.snapshot.stats, &hashiter))
!= NULL)
{     CHECK_FOR_INTERRUPTS();
     tabentry = (PgStat_StatTabEntry *) entry->data;
     if (tabentry != NULL && tabentry->vacuum_ext.type == type)         tuplestore_put_for_relation(relid, rsinfo, tabentry);
}

I know we can construct a HeapTuple object containing a TupleDesc,
values, and nulls for a particular object, but I'm not sure we can
augment it while looping through multiple objects.

/* Initialise attributes information in the tuple descriptor */
  tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_SUBSCRIPTION_STATS_COLS);

...

PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));


If I missed something or misunderstood, can you explain in more detail?

Actually, I mean why do we need a possibility to return statistics for
all tables/indexes in one function call?  User anyway is supposed to
use pg_stat_vacuum_indexes/pg_stat_vacuum_tables view, which do
function calls one per relation.  I suppose we can get rid of
possibility to get all the objects in one function call and just
return a tuple from the functions like other pgstatfuncs.c functions
do.

I haven’t thought about this before and agree with you. Thanks for the clarification! I'll fix the patch this evening and release the updated version.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Andrei Zubkov

Date:

29 October 2024, 15:40:09

Hi,

Thanks for your attention to our patch!

On Mon, 2024-10-28 at 16:03 -0500, Jim Nasby wrote:
> > Yes, but as Masahiko-san pointed out, PgStat_TableCounts is almost
> > tripled in space.  That a huge change from having no statistics on
> > vacuum to have it in much more detail than everything else we
> > currently have.  I think the feasible way might be to introduce
> > some
> > most demanded statistics first then see how it goes.
>
> Looking at the stats I do think the WAL stats are probably not
> helpful. First, there’s nothing users can do to tune how much WAL is
> generated by vacuum. Second, this introduces the risk of users saying
> “Wow, vacuum is creating a lot of WAL! I’m going to turn it down!”,
> which is most likely to make matters worse. There’s already a lot of
> stuff that goes into WAL without any detailed logging; if we ever
> wanted to provide a comprehensive view of what data is in WAL that
> should be handled separately.

Yes, there is nothing we can directly do with WAL generated by vacuum,
but WAL generation is the part of vacuum work, and it will indirectly
affected by the changes of vacuum settings. So, WAL statistics is one
more dimension of vacuum workload. Also WAL stat is universal metric
which is measured cluster-wide and on the statement-level with
pg_stat_statements. Vacuum WAL counters will explain the part of
difference between those metrics. Besides vacuum WAL counters can be
used to locate abnormal vacuum behavior caused by a bug or the data
corruption. I think if the DBA is smart enough to look at vacuum WAL
generated stats and to understand what it means, the decision to
disable the autovacuum due to its WAL generation is unlikely.

Anyway I think some stats can be excluded to save some memory. The
first candidates are the system_time and user_time fields. Those are
very valuable, but are measured by the rusage stats, which won't be
available on all platforms. I think total_time and delay_time would be
sufficient.
The second is the interrupts field. It is needed for monitoring to know
do we have them or not, so tracking them on the database level will do
the trick. Interrupt is quite rare event, so once the monitoring system
will catch one the DBA can go to the server log for the details.

It seems there is another way. If the vacuum stats doesn't seems to be
mandatory in all systems, maybe we should add some hooks to the vacuum
so that vacuum statistics tracking can be done in an extension. I don't
think it is a good idea, because vacuum stats seems to me as mandatory
as the vacuum process itself.

> Is there a reason some fields are omitted
> from pg_stat_vacuum_database? While some stats are certainly more
> interesting at the per-relation level, I can’t really think of any
> that don’t make sense at the database level as well.

Some of the metrics are table-specific, some index-specific, so we
moved to the database level metrics more or less specific to the whole
database. Can you tell what stats you want to see at the database
level?

> Looking at the per table/index stats, I strongly dislike the use of
> the term “delete” - it is a recipe for confusion with row deletion..
> A much better term is “remove” or “removed”. I realize the term
> “delete” is used in places in vacuum logging, but IMO we should fix
> that as well instead of doubling-down on it.

Yes, this point was discussed in our team, and it seems confusing to me
too. We decided to name it as it is named in the code and to get
feedback from the community. Now we get one. Thank you. Now we should
discuss it and choose the best one. My personal choice is "removed".

> I realize “relname” is being used for consistency with
> pg_stat_all_(tables|indexes), but I’m not sure it makes sense to
> double-down on that. Especially in pg_stat_vacuum_indexes, where it’s
> not completely clear whether relname is referring to the table or the
> index. I’m also inclined to say that the name of the table should be
> included in pg_stat_vacuum_indexes.

Agreed. Table name is needed in the index view.

> For all the views the docs should clarify that total_blks_written
> means blocks written by vacuum, as opposed to the background Ywriter.

We have the "Number of database blocks written by vacuum operations
performed on this table" in the docs now. Do you mean we should
specifically note the vacuum process here?

> Similarly they should clarify the difference between
> rel_blks_(read|hit) and total_blks_(read|hit). In the case of
> pg_stat_vacuum_indexes it’d be better if rel_blks_(read|hit) were
> called index_blks_(read|hit). Although… if total_blks_* is actually
> the count across the table and all the indexes I don’t know that we
> even need that counter. I realize that not ever vacuum even looks at
> the indexes, but if we’re going to go into that level of detail then
> we would (at minimum) need to count the number of times a vacuum
> completely skipped scanning the indexes.

It is not clear to me enough. The stats described just as it is -
rel_blocks_* tracks blocks of the current heap, and total_* is for the
whole database blocks - not just tables and indexes, vacuum do some
work (quite a little) in the catalog and this work is counted here too.
Usually this stat won't be helpful, but maybe we can catch unusual
vacuum behavior using this stat.

> Having rev_all_(frozen|visible)_pages in the same view as vacuum
> stats will confuse users into thinking that vacuum is clearing the
> bits. Those fields really belong in pg_stat_all_tables.

Agreed.

> Sadly index_vacuum_count is may not useful at all at present. At
> minimum you’d need to know the number of times vacuum had run in
> total. I realize that’s in pg_stat_all_tables, but that doesn’t help
> if vacuum stats are tracked or reset separately.

I'm in doubt - is it really possible to reset the vacuum stats
independent of pg_stat_all_tables?

> At minimum the docs should mention them. They also need to clarify
> if index_vacuum_count is incremented per-index or per-pass (hopefully
> the later). Assuming it’s per-pass, a better name for the field would
> be index_vacuum_passes, index_passes, index_pass_count, or similar.
> But even with that we still need a counter for the number of vacuums
> where index processing was skipped.

Agreed, the "index_passes" looks good to me, and index processing skip
counter looks good.

> First, there’s still gaps in trying to track HOT; most notably a
> counter for how many updates would never be HOT eligible because they
> modify indexes. pg_stat_all_tables.n_tup_newpage_upd is really
> limited without that info.

Nice catch, I'll think about it. Those are not directly connected to
the vacuum workload but those are important.

> There should also be stats about unused line pointers - in degenerate
> cases the lp array can consume a significant portion of heap storage.
>
> Monitoring bloat would be a lot more accurate if vacuum reported
> total tuple length for each run along with the total number of tuples
> it looked at. Having that info would make it trivial to calculate
> average tuple size, which could then be applied to reltuples and
> relpages to calculate how much space would being lost to bloat.

Yes, bloat tracking is in our plans. Right now it is not clear enough
how to do it in the most reliable and convenient way.

> Autovacuum will self-terminate if it would block another process
> (unless it’s an aggressive vacuum) - that’s definitely something that
> should be tracked. Not just the number of times that happens, but
> also stats about how much work was lost because of this.

Agreed.

> Shrinking a relation (what vacuum calls truncation, which is very
> confusing with the truncate command) is a rather complex process that
> currently has no visibility.

In this patch table truncation can be seen in the "pages_removed" field
of "pg_stat_vacuum_tables" at least as the cumulative number of removed
pages. It is not clear enough, but it is visible.

> Tuning vacuum_freeze_min_age (and the MXID variant) is rather
> complicated. We maybe have enough stats on whether it could be set
> lower, but there’s no visibility on how the settings affect how often
> vacuum decides to be aggressive. At minimum, we should have stats on
> when vacuum is aggressive, especially since it significantly changes
> the behavior of autovac.

When you say "agressive" do you mean the number of times when the
vacuum was processing the table with the FREEZE intention? I think this
is needed too.

> I saw someone else already mentioned tuning vacuum memory usage, but
> I’ll mention it again. Even if the issues with index_vacuum_count are
> fixed that still only tells you if you have a problem; it doesn’t
> give you a great idea of how much more memory you need. The best you
> can do is assuming you need (number of passes - 1) * current memory.

Do you think such approach is insufficient? It seems we do not need
byte-to-byte accuracy here.

> Speaking of which… there should be stats on any time vacuum decided
> on it’s own to skip index processing due to wraparound proximity.

Maybe we should just count the number of times when the vacuum was
started to prevent wraparound?

Jim, thank you for such detailed review of our patch!

--
regards, Andrei Zubkov
Postgres Professional

Re: Vacuum statistics

From

Jim Nasby

Date:

30 October 2024, 01:23:55

On Oct 29, 2024, at 7:40 AM, Andrei Zubkov <zubkov@moonset.ru> wrote:
>
> Hi,
>
> Thanks for your attention to our patch!
>
> On Mon, 2024-10-28 at 16:03 -0500, Jim Nasby wrote:
>>> Yes, but as Masahiko-san pointed out, PgStat_TableCounts is almost
>>> tripled in space.  That a huge change from having no statistics on
>>> vacuum to have it in much more detail than everything else we
>>> currently have.  I think the feasible way might be to introduce
>>> some
>>> most demanded statistics first then see how it goes.
>>
>> Looking at the stats I do think the WAL stats are probably not
>> helpful. First, there’s nothing users can do to tune how much WAL is
>> generated by vacuum. Second, this introduces the risk of users saying
>> “Wow, vacuum is creating a lot of WAL! I’m going to turn it down!”,
>> which is most likely to make matters worse. There’s already a lot of
>> stuff that goes into WAL without any detailed logging; if we ever
>> wanted to provide a comprehensive view of what data is in WAL that
>> should be handled separately.
>
> Yes, there is nothing we can directly do with WAL generated by vacuum,
> but WAL generation is the part of vacuum work, and it will indirectly
> affected by the changes of vacuum settings. So, WAL statistics is one
> more dimension of vacuum workload. Also WAL stat is universal metric
> which is measured cluster-wide and on the statement-level with
> pg_stat_statements. Vacuum WAL counters will explain the part of
> difference between those metrics. Besides vacuum WAL counters can be
> used to locate abnormal vacuum behavior caused by a bug or the data
> corruption. I think if the DBA is smart enough to look at vacuum WAL
> generated stats and to understand what it means, the decision to
> disable the autovacuum due to its WAL generation is unlikely.

I’m generally for more stats rather than less - really just a question of how much we’re worried about stats overhead.

> Anyway I think some stats can be excluded to save some memory. The
> first candidates are the system_time and user_time fields. Those are
> very valuable, but are measured by the rusage stats, which won't be
> available on all platforms. I think total_time and delay_time would be
> sufficient.

Yeah, I considered throwing those under the bus. I agree they’re only marginally useful.

> The second is the interrupts field. It is needed for monitoring to know
> do we have them or not, so tracking them on the database level will do
> the trick. Interrupt is quite rare event, so once the monitoring system
> will catch one the DBA can go to the server log for the details.
Just to confirm… by “interrupt” you mean vacuum encountered an error?

> It seems there is another way. If the vacuum stats doesn't seems to be
> mandatory in all systems, maybe we should add some hooks to the vacuum
> so that vacuum statistics tracking can be done in an extension. I don't
> think it is a good idea, because vacuum stats seems to me as mandatory
> as the vacuum process itself.
I’d actually like hooks for all stats, so people can develop different ways of storing/aggregating them. But I agree
that’sa separate discussion. 

>> Is there a reason some fields are omitted
>> from pg_stat_vacuum_database? While some stats are certainly more
>> interesting at the per-relation level, I can’t really think of any
>> that don’t make sense at the database level as well.
>
> Some of the metrics are table-specific, some index-specific, so we
> moved to the database level metrics more or less specific to the whole
> database. Can you tell what stats you want to see at the database
> level?

Here’s the thing with pg_stat_vacuum_database; it’s the only way to see everything in the whole cluster. So I think the
betterquestion is what metrics simply don’t make sense at that level? And I don’t really see any that don’t. 

>> For all the views the docs should clarify that total_blks_written
>> means blocks written by vacuum, as opposed to the background Ywriter.
>
> We have the "Number of database blocks written by vacuum operations
> performed on this table" in the docs now. Do you mean we should
> specifically note the vacuum process here?

The reason the stat is confusing is because it doesn’t have the meaning that the name implies. Most people that see
thiswill think it’s actually measuring blocks dirtied, or at least something closer to that. It definitely hides the
factthat many of the dirtied blocks could actually be written by the bgwriter. So an improvement to the docs would be
“Numberof blocks written directly by vacuum or auto vacuum. Blocks that are dirtied by a vacuum process can be written
outby another process.” 

Which makes me realize… I think vacuum only counts a block as dirtied if it was previously clean? If so the docs for
thatmetric need to clarify that vacuum might modify a block but not count it as having been dirtied. 

>> Similarly they should clarify the difference between
>> rel_blks_(read|hit) and total_blks_(read|hit). In the case of
>> pg_stat_vacuum_indexes it’d be better if rel_blks_(read|hit) were
>> called index_blks_(read|hit). Although… if total_blks_* is actually
>> the count across the table and all the indexes I don’t know that we
>> even need that counter. I realize that not ever vacuum even looks at
>> the indexes, but if we’re going to go into that level of detail then
>> we would (at minimum) need to count the number of times a vacuum
>> completely skipped scanning the indexes.
>
> It is not clear to me enough. The stats described just as it is -
> rel_blocks_* tracks blocks of the current heap, and total_* is for the
> whole database blocks - not just tables and indexes, vacuum do some
> work (quite a little) in the catalog and this work is counted here too.
> Usually this stat won't be helpful, but maybe we can catch unusual
> vacuum behavior using this stat.

Ok, so this just needs to be clarified in the docs by explicitly stating what is and isn’t part of the metric. It would
alsobe better not to use the term “rel” since most people don’t immediately know what that means. So,
table_blks_(read|hit)or index_blks_(read|hit). 

Also, “total” is still not clear to me, at least in the context of pg_stat_vacuum_indexes. Is that different from
pg_stat_vacuum_tables.total_blks_*?If so, how? If it’s the same then IMO it should just be removed from
pg_stat_vacuum_indexes.

>> Sadly index_vacuum_count is may not useful at all at present. At
>> minimum you’d need to know the number of times vacuum had run in
>> total. I realize that’s in pg_stat_all_tables, but that doesn’t help
>> if vacuum stats are tracked or reset separately.
>
> I'm in doubt - is it really possible to reset the vacuum stats
> independent of pg_stat_all_tables?

Most stats can be independently reset, so I was thinking these wouldn’t be an exception. If that’s not the case then I
thinkthe docs need to mention pg_stat_all_tables.(auto)vacuum_count, since it’s in a completely different view. Or
betteryet, include the vacuum/analyze related stats that are in pg_stat_all_tables in pg_stat_vacuum_tables. 

BTW, have you thought about what stats should be added for ANALYZE? That’s obviously not as critical as vacuum, but
maybeworth considering as part of this... 

>> First, there’s still gaps in trying to track HOT; most notably a
>> counter for how many updates would never be HOT eligible because they
>> modify indexes. pg_stat_all_tables.n_tup_newpage_upd is really
>> limited without that info.
>
> Nice catch, I'll think about it. Those are not directly connected to
> the vacuum workload but those are important.

Just to re-iterate: I don’t think this patch has to boil the ocean and try to handle all these extra use cases.

>> There should also be stats about unused line pointers - in degenerate
>> cases the lp array can consume a significant portion of heap storage.
>>
>> Monitoring bloat would be a lot more accurate if vacuum reported
>> total tuple length for each run along with the total number of tuples
>> it looked at. Having that info would make it trivial to calculate
>> average tuple size, which could then be applied to reltuples and
>> relpages to calculate how much space would being lost to bloat.
>
> Yes, bloat tracking is in our plans. Right now it is not clear enough
> how to do it in the most reliable and convenient way.
>
>> Autovacuum will self-terminate if it would block another process
>> (unless it’s an aggressive vacuum) - that’s definitely something that
>> should be tracked. Not just the number of times that happens, but
>> also stats about how much work was lost because of this.
>
> Agreed.
>
>> Shrinking a relation (what vacuum calls truncation, which is very
>> confusing with the truncate command) is a rather complex process that
>> currently has no visibility.
>
> In this patch table truncation can be seen in the "pages_removed" field
> of "pg_stat_vacuum_tables" at least as the cumulative number of removed
> pages. It is not clear enough, but it is visible.

Ahh, good point. I think it’s probably worth adding a counter (to this patch) for how many times vacuum actually
decidedto do page removal, because it’s (presumably) a pretty rare event. Without that counter it’s very hard to make
anysense of the number of pages removed (other than being able to see some were removed, at least once). 

>> Tuning vacuum_freeze_min_age (and the MXID variant) is rather
>> complicated. We maybe have enough stats on whether it could be set
>> lower, but there’s no visibility on how the settings affect how often
>> vacuum decides to be aggressive. At minimum, we should have stats on
>> when vacuum is aggressive, especially since it significantly changes
>> the behavior of autovac.
>
> When you say "agressive" do you mean the number of times when the
> vacuum was processing the table with the FREEZE intention? I think this
> is needed too.

Yes. I intentionally use the term “aggressive” (as the code does) to avoid confusion with the FREEZE option (which as
I’msure you know simply forces some GUCs to 0). Further complicating this is that auto vac will report this as “to
preventwraparound”… 

In any case… I’m actually leaning towards there should be a complete second set of counters for aggressive vacuums,
becauseof how differently they work. :( 

>> I saw someone else already mentioned tuning vacuum memory usage, but
>> I’ll mention it again. Even if the issues with index_vacuum_count are
>> fixed that still only tells you if you have a problem; it doesn’t
>> give you a great idea of how much more memory you need. The best you
>> can do is assuming you need (number of passes - 1) * current memory.
>
> Do you think such approach is insufficient? It seems we do not need
> byte-to-byte accuracy here.

Byte-for-byte, no. But I do wonder if there’s any way to do better than some multiple of what *_work_mem was set to.

And setting that aside, another significant problem is that you can’t actually do anything here without actually
knowingwhat memory setting was used, which is definitely not a given. Off-hand I don’t see anyway this can actually be
tuned(at all) with nothing but counters. :( 

Definitely out of scope for this patch though :)

>> Speaking of which… there should be stats on any time vacuum decided
>> on it’s own to skip index processing due to wraparound proximity.
>
> Maybe we should just count the number of times when the vacuum was
> started to prevent wraparound?

Unfortunately even that isn’t simple… auto vac and manual vac have different GUCs, and of course there’s the FREEZE
option.And then there’s the issue that MXIDs are handled completely separately. 

Even ignoring all of that… by default an aggressive vacuum won’t skip indexes. That only happens when you hit
vacuum_(multixact_)failsafe_age.

BTW, something I’ve been mulling over is what stats related to cleanup might be tracked at a system level. I’m thinking
alongthe lines of how often heap_prune_page or the index marking code come across a dead tuple they can’t do anything
aboutyet because it’s still visible. While you could track that per-relation, I’m not sure how helpful that actually is
sinceit’s really a long-running transaction problem. 

Similarly, it’d be nice if we had stats about how often all of the auto vac workers were occupied; something that’s
alsoglobal in nature.

Re: Vacuum statistics

From

Alena Rybakina

Date:

02 November 2024, 15:24:31

Hi!

On 29.10.2024 14:02, Alena Rybakina wrote:

On 28.10.2024 16:40, Alexander Korotkov wrote:

On Sun, Aug 25, 2024 at 6:59 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:

I didn't understand correctly - did you mean that we don't need SRF if
we need to display statistics for a specific object?

Otherwise, we need this when we display information on all database
objects (tables or indexes):

while ((entry = ScanStatSnapshot(pgStatLocal.snapshot.stats, &hashiter))
!= NULL)
{     CHECK_FOR_INTERRUPTS();
     tabentry = (PgStat_StatTabEntry *) entry->data;
     if (tabentry != NULL && tabentry->vacuum_ext.type == type)         tuplestore_put_for_relation(relid, rsinfo, tabentry);
}

I know we can construct a HeapTuple object containing a TupleDesc,
values, and nulls for a particular object, but I'm not sure we can
augment it while looping through multiple objects.

/* Initialise attributes information in the tuple descriptor */
  tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_SUBSCRIPTION_STATS_COLS);

...

PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));


If I missed something or misunderstood, can you explain in more detail?

Actually, I mean why do we need a possibility to return statistics for
all tables/indexes in one function call?  User anyway is supposed to
use pg_stat_vacuum_indexes/pg_stat_vacuum_tables view, which do
function calls one per relation.  I suppose we can get rid of
possibility to get all the objects in one function call and just
return a tuple from the functions like other pgstatfuncs.c functions
do.

I haven’t thought about this before and agree with you. Thanks for the clarification! I'll fix the patch this evening and release the updated version.

I updated the patches as per your suggestion. You can see it here [0].

[0] https://www.postgresql.org/message-id/85b963fe-5977-43aa-9241-75b862abcc69%40postgrespro.ru

Re: Vacuum statistics

From

Ilia Evdokimov

Date:

07 November 2024, 17:49:23

On 22.10.2024 22:30, Alena Rybakina wrote:

Hi!
On 16.10.2024 14:01, Alena Rybakina wrote:
Thank you for rebasing.
I have noticed that when I create a table or an index on this table, there is no information about the table or index in pg_stat_vacuum_tables and pg_stat_vacuum_indexes until we perform a VACUUM.
Example:
CREATE TABLE t (i INT, j INT);
INSERT INTO t SELECT i/10, i/100 FROM GENERATE_SERIES(1,1000000) i;
SELECT * FROM pg_stat_vacuum_tables WHERE relname = 't';
....
(0 rows)
CREATE INDEX ON t (i);
SELECT * FROM pg_stat_vacuum_indexes WHERE relname = 't_i_idx';
...
(0 rows)
I can see the entries after running VACUUM or executing autovacuum. or when autovacuum is executed. I would suggest adding a line about the relation even if it has not yet been processed by vacuum. Interestingly, this issue does not occur with pg_stat_vacuum_database:
CREATE DATABASE example_db;
SELECT * FROM pg_stat_vacuum_database WHERE dbname = 'example_db';
dboid | dbname | ...
... | example_db | ...
(1 row)

BTW, I recommend renaming the view pg_stat_vacuum_database to pg_stat_vacuum_databaseS for consistency with pg_stat_vacuum_tables and pg_stat_vacuum_indexes
Thanks for the review. I'm investigating this. I agree with the renaming, I will do it in the next version of the patch.
I fixed it. I added the left outer join to the vacuum views and for converting the coalesce function from NULL to null values.
I also fixed the code in getting database statistics - we can get it through the existing pgstat_fetch_stat_dbentry function and fixed couple of comments.
I attached a diff file, as well as new versions of patches.
-- 
Regards,
Alena Rybakina
Postgres Professional

Thank you for fixing it.

1) I have found some typos in the test output files (out-files) when running 'make check' and 'make check-world'. These typos might cause minor discrepancies in test results. You may already be aware of them, but I wanted to bring them to your attention in case they haven't been noticed. I believe these can be fixed quickly.

2) Additionally, I observed that when we create a table and insert some rows, executing the VACUUM FULL command does not update the information in the 'pg_stat_get_vacuum_tables' However, running the VACUUM command does update this information as expected. This seems inconsistent, and it might be a bug.

Regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Vacuum statistics

From

Alena Rybakina

Date:

08 November 2024, 22:23:42

Hi! Thank you for review!

On 07.11.2024 17:49, Ilia Evdokimov wrote:
>
> Thank you for fixing it.
>
> 1) I have found some typos in the test output files (out-files) when 
> running 'make check' and 'make check-world'. These typos might cause 
> minor discrepancies in test results. You may already be aware of them, 
> but I wanted to bring them to your attention in case they haven't been 
> noticed. I believe these can be fixed quickly.
>
Yes, I'll fix it)
>
> 2) Additionally, I observed that when we create a table and insert 
> some rows, executing the VACUUM FULL command does not update the 
> information in the 'pg_stat_get_vacuum_tables' However, running the 
> VACUUM command does update this information as expected. This seems 
> inconsistent, and it might be a bug.
>
> Example:
> CREATE TABLE t (i INT, j INT) WITH (autovacuum_enabled = false);
> INSERT INTO t SELECT i/10, i/100 FROM GENERATE_SERIES(1,1000000) i;
> SELECT * FROM pg_stat_get_vacuum_tables WHERE relname = 't';
> schema | relname |    relid | total_blks_read | .........
> -----------+------------+---------+----------------------+---------
>    public | t            | 21416 |                       0 | ......
> (1 row)
>
> VACUUM FULL;
> SELECT * FROM pg_stat_get_vacuum_tables WHERE relname = 't';
> schema | relname |    relid | total_blks_read | .........
> -----------+------------+---------+----------------------+---------
>    public | t            | 21416 |                       0 | ......
> (1 row)
>
> VACUUM;
> SELECT * FROM pg_stat_get_vacuum_tables WHERE relname = 't';
> schema | relname |    relid | total_blks_read | .........
> -----------+------------+---------+----------------------+---------
>    public | t            | 21416 |                 4425 | ......
> (1 row)
>
>
vacuum full operation doesn't call a vacuum operation, so we can't 
collect statistics for it. Furthermore, this is a different operation 
than vacuum because it completely rebuilds the table and indexes, so it 
looks like your previous table and its indexes were completely removed. 
To sum up, I think it isn't a bug that the statistics aren't showing here.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Jim Nasby

Date:

08 November 2024, 22:34:56

On Nov 2, 2024, at 7:22 AM, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:

The second is the interrupts field. It is needed for monitoring to know
do we have them or not, so tracking them on the database level will do
the trick. Interrupt is quite rare event, so once the monitoring system
will catch one the DBA can go to the server log for the details.

Just to confirm… by “interrupt” you mean vacuum encountered an error?

Yes it is.

In that case I feel rather strongly that we should label that as “errors”. “Interrupt” could mean a few different things, but “error” is very clear.

I updated patches. I excluded system and user time statistics and save number of interrupts only for database. I removed the ability to get statistics for all tables, now they can only be obtained for an oid table [0], as suggested here. I also renamed the statistics from pg_stat_vacuum_tables to pg_stat_get_vacuum_tables and similarly for indexes and databases. I noticed that that’s what they’re mostly called. Ready for discussion.

I think it’s better that the views follow the existing naming conventions (which don’t include “_get_”; only the functions have that in their names). Assuming that, the only question becomes pg_stat_vacuum_* vs pg_stat_*_vacuum. Given the existing precedent of pg_statio_*, I’m inclined to go with pg_stat_vacuum_*.

Re: Vacuum statistics

From

Ilia Evdokimov

Date:

10 November 2024, 19:51:09

On 08.11.2024 22:23, Alena Rybakina wrote:

Hi! Thank you for review!

On 07.11.2024 17:49, Ilia Evdokimov wrote:

Thank you for fixing it.

1) I have found some typos in the test output files (out-files) when running 'make check' and 'make check-world'. These typos might cause minor discrepancies in test results. You may already be aware of them, but I wanted to bring them to your attention in case they haven't been noticed. I believe these can be fixed quickly.

Yes, I'll fix it)

2) Additionally, I observed that when we create a table and insert some rows, executing the VACUUM FULL command does not update the information in the 'pg_stat_get_vacuum_tables' However, running the VACUUM command does update this information as expected. This seems inconsistent, and it might be a bug.

Example:
CREATE TABLE t (i INT, j INT) WITH (autovacuum_enabled = false);
INSERT INTO t SELECT i/10, i/100 FROM GENERATE_SERIES(1,1000000) i;
SELECT * FROM pg_stat_get_vacuum_tables WHERE relname = 't';
schema | relname |    relid | total_blks_read | .........
-----------+------------+---------+----------------------+---------
   public | t        | 21416 |                 0 | ......
(1 row)

VACUUM FULL;
SELECT * FROM pg_stat_get_vacuum_tables WHERE relname = 't';
schema | relname |    relid | total_blks_read | .........
-----------+------------+---------+----------------------+---------
   public | t        | 21416 |                 0 | ......
(1 row)

VACUUM;
SELECT * FROM pg_stat_get_vacuum_tables WHERE relname = 't';
schema | relname |    relid | total_blks_read | .........
-----------+------------+---------+----------------------+---------
   public | t        | 21416 |             4425 | ......
(1 row)

vacuum full operation doesn't call a vacuum operation, so we can't collect statistics for it. Furthermore, this is a different operation than vacuum because it completely rebuilds the table and indexes, so it looks like your previous table and its indexes were completely removed. To sum up, I think it isn't a bug that the statistics aren't showing here.

Ah, you're right. This table does contain the statistics for it. Everything is okay then. Sorry for the confusion.

Regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Vacuum statistics

From

Kirill Reshke

Date:

30 November 2024, 07:48:22

On Wed, 13 Nov 2024 at 21:21, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>
> Hi! Thank you for your contribution to this thread!
>
> On 13.11.2024 03:24, Jim Nasby wrote:
>
> On Nov 10, 2024, at 2:09 PM, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>
>
> On 08.11.2024 22:34, Jim Nasby wrote:
>
>
> On Nov 2, 2024, at 7:22 AM, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>
> The second is the interrupts field. It is needed for monitoring to know
> do we have them or not, so tracking them on the database level will do
> the trick. Interrupt is quite rare event, so once the monitoring system
> will catch one the DBA can go to the server log for the details.
>
> Just to confirm… by “interrupt” you mean vacuum encountered an error?
>
> Yes it is.
>
> In that case I feel rather strongly that we should label that as “errors”. “Interrupt” could mean a few different
things,but “error” is very clear. 
>
> I updated patches. I excluded system and user time statistics and save number of interrupts only for database. I
removedthe ability to get statistics for all tables, now they can only be obtained for an oid table [0], as suggested
here.I also renamed the statistics from pg_stat_vacuum_tables to pg_stat_get_vacuum_tables and similarly for indexes
anddatabases. I noticed that that’s what they’re mostly called. Ready for discussion. 
>
> I think it’s better that the views follow the existing naming conventions (which don’t include “_get_”; only the
functionshave that in their names). Assuming that, the only question becomes pg_stat_vacuum_* vs pg_stat_*_vacuum.
Giventhe existing precedent of pg_statio_*, I’m inclined to go with pg_stat_vacuum_*. 
>
> I have fixed it.
>
>
> I’ve reviewed and made some cosmetic changes to patch 1, though of note it looks like an effort has been made to keep
stat_reset_timestampat the end of PgStat_StatDBEntry, so I re-arranged that. I also removed some obviously dead code.
Itappears that pgstat_update_snapshot(), InitSnapshotIterator() and ScanStatSnapshot() are also dead, but I’ve left it
inincase I’m missing something. The tests are also failing for me because a number of psql variables aren’t set. 
>
> Thank you! Yes, I have deleted them.
>
>
> I do think we should separate out the counts for deleted but still visible tuples vs tuples where we couldn’t get a
cleanuplock (in other words, recently_dead_tuples and missed_dead_tuples from LVRelState). I realize that’s a departure
fromhow some of the existing reporting works, but IMO combining them together isn’t a pattern we should be repeating
sincethey mean completely different things. Towards that end I did remove missed_dead_tuples from the reporting, and
renamedExtVacReport.dead_tuples to recently_dead_tuples, but I stopped short of creating a separate entry for
missed_dead_tuples.Note that while recently_dead_tuples is really a global thing (so only needs to be reported at a
global(or at most per-database) level, but missed_dead_tuples should really be at a per-table level. 
>
> I am willing to agree with your idea. But we need to think about how clearly describe them in the documentation.
>
>
> Updated 0001-v13 attached, as well as the diff between v12 and v13.
>
> Thank you)
>
> And I agree with your changes. And included them in patches.
>
> ---
> Regards,
> Alena Rybakina
> Postgres Professional

Hello!
After a brief glance, I think this patch set is good.
But there isn't any more time in the current CF to commit this :(.
So I moved to the next CF.

I also like the 0001 commit message. This commit message is quite
large and easy to understand. Actually, it might be too big. Perhaps
rather of being a commit message, the final paragraph (pages_frozen -
number of pages that..) need to be a part of the document. Perhaps
delete the explanation on pages_frozen that we have in 0004?


--
Best regards,
Kirill Reshke

Re: Vacuum statistics

From

Alexander Korotkov

Date:

02 December 2024, 11:27:18

Hi, Alena!

On Wed, Nov 13, 2024 at 6:21 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
> Updated 0001-v13 attached, as well as the diff between v12 and v13.
>
> Thank you)
>
> And I agree with your changes. And included them in patches.

Thank you for the updated patchset.  Some points from me.

* I've read the previous discussion on how important to keep all these
fields regarding vacuum statistics including points by Andrei and Jim.
It still worrying me that statistics volume is going to burst in about
3 times, but I don't have a particular proposal on how to make more
granular approach.  I wonder if you could propose something.
* Previously PGSTAT_FILE_FORMAT_ID got increased by 1. Your 0001 patch
increases it by 2.  It's minor note, but I'd like to keep the
tradition.
* Commit message for 0001 looks nice, but commit messages of 0002,
0003, and 0004 look messy.  Could you please, rearrange them.
* The distinction between 0001 and 0002 is not clear. The first line
of 0001 is "Machinery for grabbing an extended vacuum statistics on
heap relations", the first line of 0002 is "Machinery for grabbing an
extended vacuum statistics on heap and index relations."  I guess 0001
should be about heap relations while 0002 should be about just index
relations.  Is this correct?
* I guess this statistics should work for any table AM, based on what
has been done in relation_vacuum() interface method.  If that's
correct, we need to get rid of "heap" terminology and use "table"
instead.
* 0004 should be pure documentation patch, but it seems containing
changes to isolation tests.  Please, move them into a more appropriate
place.

------
Regards,
Alexander Korotkov
Supabase

Re: Vacuum statistics

From

Ilia Evdokimov

Date:

02 December 2024, 17:46:14

In my opinion, the patches are semantically correct. However, not all 
dead code has been removed - I'm referring to pgstat_update_snapshot(). 
Also, the tests need to be fixed.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Vacuum statistics

From

Alena Rybakina

Date:

02 December 2024, 23:12:05

On 02.12.2024 11:27, Alexander Korotkov wrote:

Hi, Alena!

On Wed, Nov 13, 2024 at 6:21 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:

Updated 0001-v13 attached, as well as the diff between v12 and v13.

Thank you)

And I agree with your changes. And included them in patches.

Thank you for the updated patchset.  Some points from me.

* I've read the previous discussion on how important to keep all these
fields regarding vacuum statistics including points by Andrei and Jim.
It still worrying me that statistics volume is going to burst in about
3 times, but I don't have a particular proposal on how to make more
granular approach.  I wonder if you could propose something.
* Previously PGSTAT_FILE_FORMAT_ID got increased by 1. Your 0001 patch
increases it by 2.  It's minor note, but I'd like to keep the
tradition.
* Commit message for 0001 looks nice, but commit messages of 0002,
0003, and 0004 look messy.  Could you please, rearrange them.
* The distinction between 0001 and 0002 is not clear. The first line
of 0001 is "Machinery for grabbing an extended vacuum statistics on
heap relations", the first line of 0002 is "Machinery for grabbing an
extended vacuum statistics on heap and index relations."  I guess 0001
should be about heap relations while 0002 should be about just index
relations.  Is this correct?
* I guess this statistics should work for any table AM, based on what
has been done in relation_vacuum() interface method.  If that's
correct, we need to get rid of "heap" terminology and use "table"
instead.
* 0004 should be pure documentation patch, but it seems containing
changes to isolation tests.  Please, move them into a more appropriate
place.

Thank you for your valuable feedback, I am already carefully processing your comments and will update the patches soon.

I will think about what can be done to address the problem of increasing the volume of statistics; perhaps it will be possible to implement a guc that, when enabled, will accumulate additional information on vacuum statistics. For example, this way you can group statistics by buffers and vacuum statistics.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Alena Rybakina

Date:

02 December 2024, 23:59:58

Hi! Thank you for your review!

On 30.11.2024 07:48, Kirill Reshke wrote:

Hello!
After a brief glance, I think this patch set is good.
But there isn't any more time in the current CF to commit this :(.
So I moved to the next CF.

I agree with you. Thank you!)

I also like the 0001 commit message. This commit message is quite
large and easy to understand. Actually, it might be too big. Perhaps
rather of being a commit message, the final paragraph (pages_frozen -
number of pages that..) need to be a part of the document. Perhaps
delete the explanation on pages_frozen that we have in 0004?

To be honest, I don't quite understand what you're suggesting. Are you suggesting moving the explanation about the pages_frozen from the commit message to the documentation or fixing something in the documentation about the pages_frozen? Can you please explain?

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Alena Rybakina

Date:

03 December 2024, 00:00:37

On 02.12.2024 17:46, Ilia Evdokimov wrote:
> In my opinion, the patches are semantically correct. However, not all 
> dead code has been removed - I'm referring to 
> pgstat_update_snapshot(). Also, the tests need to be fixed.
>
Thank you, I'll fix it

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Alena Rybakina

Date:

19 December 2024, 13:40:54

Hi!

On 02.12.2024 17:46, Ilia Evdokimov wrote:
> In my opinion, the patches are semantically correct. However, not all 
> dead code has been removed - I'm referring to 
> pgstat_update_snapshot(). Also, the tests need to be fixed.
>
>
I fixed it [0]. Thank you!

[0] 
https://www.postgresql.org/message-id/86f76aa5-1ab5-4e2e-9b15-405051852a2a%40postgrespro.ru

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Sami Imseih

Date:

02 January, 23:12:08

Hi,

Thanks for the work you have done here. Exposing cumulative
metrics at this level of detail for vacuum is surely useful to find
vacuum bottlenecks and to determine the effectiveness of
vacuum tuning.

I am yet to look very closely, but I think some additional columns that
will be useful is the number of failsafe autovacuums occurred. Also
the counter for number of index_cleanup skipped, truncate phase
skipped and toast vacuuming skipped ( the latter will only be relevant
for the main relation ).

I also wonder if if makes sense to break down timing by phase. I surely
would like to know how much of my vacuum time was spent in index
cleanup vs heap scan, etc.

A nit: I noticed in v14, the column is "schema". It should be "schemaname"
for consistency.

Also, instead of pg_stat_vacuum_tables, what about pg_stat_vacuum?

Now, I became aware of this discussion after starting a new thread
to track total time spent in vacuum/analyze in pg_stat_all_tables [1].
But this begs the question of what should be done with the current
counters in pg_stat_all_tables? I see it mentioned above that (auto)vacuum_count
should be added to this new view, but it's also already in pg_stat_all_tables.
I don't think we should be duplicating the same columns across views.

I think total_time should be removed from your current patch and added
as is being suggested in [1]. This way high level metrics such as counts
and total time spent remain in pg_stat_all_tables, while the new view
you are proposing will contain more details. I don't think we will have
consistency issues between the views because a reset using pg_stat_reset()
will act on all the stats and pg_stat_reset_single_table_counters() will act on
all the stats related to that table. There should be no way to reset the vacuum
stats independently, AFAICT.

Alternatively, we can remove the vacuum related stats from pg_stat_all_tables,
but that will break monitoring tools and will leave us with the (auto)analyze
metrics alone in pg_stat_all_tables. This sounds very ugly.

What do you think?

Regards,

Sami Imseih
Amazon Web Services (AWS)

[1] https://commitfest.postgresql.org/52/5485/

Re: Vacuum statistics

From

Jim Nasby

Date:

03 January, 00:14:13

On Jan 2, 2025, at 2:12 PM, Sami Imseih <samimseih@gmail.com> wrote:

Alternatively, we can remove the vacuum related stats from pg_stat_all_tables,
but that will break monitoring tools and will leave us with the (auto)analyze
metrics alone in pg_stat_all_tables. This sounds very ugly.

While backwards compatibility is important, there’s definitely precedent for changing what shows up in the catalog. IMHO it’s better to bite the bullet and move those fields instead of having vacuum stats spread across two different views.

Re: Vacuum statistics

From

Sami Imseih

Date:

03 January, 01:33:33

> While backwards compatibility is important, there’s definitely precedent for changing
> what shows up in the catalog. IMHO it’s better to bite the bullet and move those fields
> instead of having vacuum stats spread across two different views.

Correct, the most recent one that I could think of is pg_stat_checkpointer,
which pulled the checkpoint related columns from pg_stat_bgwriter.
In that case though, these are distinct background processes and
it's a clear distinction.

In this case, I am not so sure about this, particularly because
we will then have the autoanalyze and autovacuum fields in different
views, which could be more confusing to users than saying pg_stat_all_tables
has high level metrics about vacuum and analyze and for more details on
vacuum, refer to pg_stat_vacuum_tables ( or whatever name we settle on ).

Regards,

Sami

Re: Vacuum statistics

From

Greg Sabino Mullane

Date:

03 January, 17:15:30

While backwards compatibility is important, there’s definitely precedent for changing what shows up in the catalog. IMHO it’s better to bite the bullet and move those fields instead of having vacuum stats spread across two different views.

-1. That's a huge change, and pg_stat_all_tables is used way, way more than things like pg_stat_bgwriter.

Cheers,

Greg

Re: Vacuum statistics

From

Jim Nasby

Date:

03 January, 22:08:02

On Jan 2, 2025, at 4:33 PM, Sami Imseih <samimseih@gmail.com> wrote:
>
>> While backwards compatibility is important, there’s definitely precedent for changing
>> what shows up in the catalog. IMHO it’s better to bite the bullet and move those fields
>> instead of having vacuum stats spread across two different views.
>
> Correct, the most recent one that I could think of is pg_stat_checkpointer,
> which pulled the checkpoint related columns from pg_stat_bgwriter.
> In that case though, these are distinct background processes and
> it's a clear distinction.
>
> In this case, I am not so sure about this, particularly because
> we will then have the autoanalyze and autovacuum fields in different
> views, which could be more confusing to users than saying pg_stat_all_tables
> has high level metrics about vacuum and analyze and for more details on
> vacuum, refer to pg_stat_vacuum_tables ( or whatever name we settle on ).

I guess one question is how realistic it is to try and put everything about (auto)vacuum in a single view. Given the
complexity,the answer to that might just be “no”. In that case leaving existing fields in pg_stat_all_tables is a lot
morereasonable. 

Related to this… it’d be nice if we had a view that gave insight to users about auto vacuum scheduling. I know there’s
onefloating around the internet, but given the number of systems I’ve seen where autovac can’t keep up it’d be good to
raiseuser awareness.

Re: Vacuum statistics

From

Sami Imseih

Date:

03 January, 22:46:25

> I guess one question is how realistic it is to try and put everything about (auto)vacuum in a single view.
> Given the complexity, the answer to that might just be “no”. In that case leaving existing fields in
pg_stat_all_tables
> is a lot more reasonable.

Agree. I also think the total_time should be in pg_stat_all_tables.
total_time is a high level metric that along with vacuum_count
can calculate average run time of vacuums on a specific table.

Everything else in the new view are more granular details.

Regards,

Sami

Re: Vacuum statistics

From

Alena Rybakina

Date:

04 January, 23:37:51

Hi, thank you for your attention to this patch.

On 02.01.2025 23:12, Sami Imseih wrote:

Hi,

Thanks for the work you have done here. Exposing cumulative
metrics at this level of detail for vacuum is surely useful to find
vacuum bottlenecks and to determine the effectiveness of
vacuum tuning.

Yes, we hope that this will help provide more detailed information about the current efficiency of the vacuum and also suggest how to best configure it for the relationship.

I am yet to look very closely, but I think some additional columns that
will be useful is the number of failsafe autovacuums occurred.

Do you mean when the autovacuum started to prevent workaround?

Also
the counter for number of index_cleanup skipped, truncate phase
skipped and toast vacuuming skipped ( the latter will only be relevant
for the main relation ).

I can add, but concerns have already been expressed about the large amount of vacuum statistics and, as a consequence, this leads to the allocation of additional memory (3 times).

Of course, now we are saved by the guc I added for statistics... I understand that this information can better show the efficiency of the vacuum, but how does it help in setting it up for heap relations?

regarding the skipped truncate phase, the statistics are already collected in vacrel->nonempty_pages, it's easy to put them outside. I think the current statistics only show the number of deleted tuples and pages (both deleted and those visited by vacuum during tuple deletion), so the opposite view won't hurt.

index_cleanup skipped can be obtained based on information from a small number of vacuum buffer statistics and the number of pages of indexes that belong to heap relations. I think you can notice the behavior through current statistics: if the index's buffer values have increased very slightly, then the vacuum does not go there probably because of the impossibility of taking a clean-up lock on the index. The same information can be obtained based on the number of missed_tuples in heap relations. I wrote earlier how these values are related.

toast vacuuming skipped to be honest I haven't found a place where vacuum skips it in the code yet, so I can't say anything about them yet.

I also wonder if if makes sense to break down timing by phase. I surely
would like to know how much of my vacuum time was spent in index
cleanup vs heap scan, etc.

At the moment, this information has already been added to the statistics as a total time for heap relations and their indexes.


A nit: I noticed in v14, the column is "schema". It should be "schemaname"
for consistency.

Thank you, I'll fix it in the next version of the patch.

Also, instead of pg_stat_vacuum_tables, what about pg_stat_vacuum?

Now, I became aware of this discussion after starting a new thread
to track total time spent in vacuum/analyze in pg_stat_all_tables [1].
But this begs the question of what should be done with the current
counters in pg_stat_all_tables? I see it mentioned above that (auto)vacuum_count
should be added to this new view, but it's also already in pg_stat_all_tables.
I don't think we should be duplicating the same columns across views.

Alternatively, we can remove the vacuum related stats from pg_stat_all_tables,
but that will break monitoring tools and will leave us with the (auto)analyze
metrics alone in pg_stat_all_tables. This sounds very ugly.

What do you think?

Regards,

Sami Imseih
Amazon Web Services (AWS)

[1] https://commitfest.postgresql.org/52/5485/

I don't think they interfere with my more detailed views of how the vacuum works. I don't think there's anything worth removing.


I think total_time should be removed from your current patch and added
as is being suggested in [1]. This way high level metrics such as counts
and total time spent remain in pg_stat_all_tables, while the new view
you are proposing will contain more details. I don't think we will have
consistency issues between the views because a reset using pg_stat_reset()
will act on all the stats and pg_stat_reset_single_table_counters() will act on
all the stats related to that table. There should be no way to reset the vacuum
stats independently, AFAICT.

I think it is not quite correct to do so.

Firstly, the total time of vacuum operation does not give you a complete idea of when vacuum did not work delay time. I have seen many reports where vacuum spends very little time on cleaning relations and most of the time just sleeping.

Secondly, where to put the total time of vacuum for indexes and databases? It would be incorrect not to take them into account at all. What if we remove the total time from the heap statistics and add it to pg_stat_tables and only leave the vacuum statistics total time of vacuum operation of indexes and databases? It seems strange to me that they will have to be viewed from different views.

I think it is necessary to look at the total time for tables into perspective of how much time vacuum spent in total on processing indexes, since indexes can be bloated, for example. I think it is better to leave these statistics here.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Sami Imseih

Date:

06 January, 05:00:16

> I am yet to look very closely, but I think some additional columns that
> will be useful is the number of failsafe autovacuums occurred.
>
> Do you mean when the autovacuum started to prevent workaround?
>

Specifically vacuum_failsafe_age [1] when autovacuum automatically
performs a vacuum without index cleanup, without truncate, bypassing
the vacuum ring buffer and disabling the cost limits. The purpose of this
is a last ditch effort to avoid wraparound and is triggered at 1.6 billion
transactions by default. When this state occurs, there is a single log
written for every table that is vacuumed with these options [2], and
my thoughts is to also track in the view as the use of these options
will overtime make the indexes bloat over time and less space is
given back to the OS due to skipped truncations. For most workloads,
this should not be common, but I am thinking of the extreme cases
or if someone potentially misconfigured the vacuum_failsafe_age.

As I thought about this more, failsafe autovacuum could be tracked on
the database level, pg_stat_database, since this guc can't be set
on a relation level.

> Also
> the counter for number of index_cleanup skipped, truncate phase
> skipped and toast vacuuming skipped ( the latter will only be relevant
> for the main relation ).
>
> I can add, but concerns have already been expressed about the large amount of
> vacuum statistics and, as a consequence, this leads
> to the allocation of additional memory (3 times).
> Of course, now we are saved by the guc I added for statistics...
> I understand that this information can better show the efficiency of the vacuum,
> but how does it help in setting it up for heap relations?

An administrator will find this information to be useful especially
if for some reason most vacuums are being run with these
options being off either via a manual vacuum or someone
turning off index_cleanup in the tables storage parameter.

postgres=# alter table t set (vacuum_index_cleanup = off,
vacuum_truncate = off );
ALTER TABLE

> regarding the skipped truncate phase, the statistics are already collected in vacrel->nonempty_pages,
> it's easy to put them outside. I think the current statistics only show the number of deleted tuples and pages
> (both deleted and those visited by vacuum during tuple deletion), so the opposite view won't hurt.

Can you clarify what you mean by "so the opposite view won't hurt." ?

> index_cleanup skipped can be obtained based on information from a small number of
> vacuum buffer statistics and the number of pages of indexes that belong to heap relations.
> I think you can notice the behavior through current statistics:

I don't think there is a view that provides cumulative vacuum buffer
stats currently.
pg_stat_io could be helpful for this purpose, but that is a cluster
wide view. As it
stands now, I think it's quite difficult for a user to determine for a
fact if indexes or
truncate is being skipped


> Secondly, where to put the total time of vacuum for indexes and databases?
> It would be incorrect not to take them into account at all. What if we remove the total time from
> the heap statistics and add it to pg_stat_tables and only leave the vacuum statistics total time of
> vacuum operation of indexes and databases?
> It seems strange to me that they will have to be viewed from different views.
>
> I think it is necessary to look at the total time for tables into perspective of how much
> time vacuum spent in total on processing indexes, since indexes can be bloated, for example.
> I think it is better to leave these statistics here.

You make valid points. I now think because track_vacuum_statistics is
optional, we should track total_time in 2 places. First place in the new
view being proposed here and the second place is in pg_stat_all_tables
as being proposed here [3]. This way if track_vacuum_statistics is off, the
total_time of vacuum could still be tracked by pg_stat_all_tables.

By the way, the current patch does not track materialized view,
but it should as materialized views can also be vacuumed.

Regards,

Sami

[1] https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-VACUUM-FAILSAFE-AGE
[2] https://github.com/postgres/postgres/blob/master/src/backend/access/heap/vacuumlazy.c#L2437-L2444
[3] https://commitfest.postgresql.org/52/5485/

Re: Vacuum statistics

From

Alexander Korotkov

Date:

02 February, 23:43:19

On Mon, Jan 13, 2025 at 3:26 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
> I noticed that the cfbot is bad, the reason seems to be related to the lack of a parameter in
src/backend/utils/misc/postgresql.conf.sample.I added it, it should help. 

The patch doesn't apply cleanly.  Please rebase.

I see you introduced new GUC variable pgstat_track_vacuum_statistics,
which should address the increased size of statistics.  However, I
don't see how it could affect the size of PgStat_StatTabEntry struct.
It seems that when pgstat_track_vacuum_statistics == 0, extended
vacuum statistics is not collected but the size of hash table entries
is the same.  Also, should pgstat_track_vacuum_statistics also affect
per database statistics?

The name of 0001 is "... on heap relations".  Should we say "on table
relations", because new machinery should work with alternative table
AMs as well.

There are deletions of empty lines in
src/include/utils/pgstat_internal.h and src/include/pgstat.h.  Please,
remote them as it's not purpose of this patchset.

------
Regards,
Alexander Korotkov
Supabase

Re: Vacuum statistics

From

Alexander Korotkov

Date:

05 February, 09:59:40

On Tue, Feb 4, 2025 at 5:22 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>
> Hi! Thank you for your review!
>
> On 02.02.2025 23:43, Alexander Korotkov wrote:
> > On Mon, Jan 13, 2025 at 3:26 PM Alena Rybakina
> > <a.rybakina@postgrespro.ru> wrote:
> >> I noticed that the cfbot is bad, the reason seems to be related to the lack of a parameter in
src/backend/utils/misc/postgresql.conf.sample.I added it, it should help. 
> > The patch doesn't apply cleanly.  Please rebase.
> I rebased them.
> >
> > I see you introduced new GUC variable pgstat_track_vacuum_statistics,
> > which should address the increased size of statistics.  However, I
> > don't see how it could affect the size of PgStat_StatTabEntry struct.
> > It seems that when pgstat_track_vacuum_statistics == 0, extended
> > vacuum statistics is not collected but the size of hash table entries
> > is the same.
>
> Yes, hash table entries will be the same but vacuum_ext structure stored
> in PgStat_StatTabEntry will not be filled with statistics, although
> vacuum_ext structure stored in PgStat_StatDBEntry will be fill be.

What is the point for disabling pgstat_track_vacuum_statistics then?
I don't see it saves any valuable resources.  The original point by
Masahiko Sawada was growth of data structures in times [1] (and
corresponding memory consumption especially with large number of
tables).  Now, disabling pgstat_track_vacuum_statistics only saves
some cycles of pgstat_accumulate_extvac_stats(), and that seems
insignificant.

I see that we use hash tables with static element size.  So, we can't
save space by dynamically changing entries size on the base of GUC.
But could we move vacuum statistics to separate hash tables?  When GUC
is disabled, new hash tables could be just empty.

Links
1. https://www.postgresql.org/message-id/CAD21AoD66b3u28n%3D73kudgMp5wiGiyYUN9LuC9z2ka6YTru5Gw%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

Re: Vacuum statistics

From

Alena Rybakina

Date:

05 February, 15:03:24

On 05.02.2025 09:59, Alexander Korotkov wrote:

What is the point for disabling pgstat_track_vacuum_statistics then?
I don't see it saves any valuable resources.  The original point by
Masahiko Sawada was growth of data structures in times [1] (and
corresponding memory consumption especially with large number of
tables).  Now, disabling pgstat_track_vacuum_statistics only saves
some cycles of pgstat_accumulate_extvac_stats(), and that seems
insignificant.

I see that we use hash tables with static element size.  So, we can't
save space by dynamically changing entries size on the base of GUC.
But could we move vacuum statistics to separate hash tables?  When GUC
is disabled, new hash tables could be just empty.

Links
1. https://www.postgresql.org/message-id/CAD21AoD66b3u28n%3D73kudgMp5wiGiyYUN9LuC9z2ka6YTru5Gw%40mail.gmail.com

I understand what you're talking about. I'm looking at the pgstat_assoc_relation function and I think that's where I need to decide whether we need to allocate memory in the hash table for vacuum statistics for them or not.
The same thing happens there depending on the installed pgstat_track_counts guc and pgstat_enabled value consequently. Like here:

Specifically, there is an example that for partitions, for example, statistics are not accumulated and the condition used like that, like here:

if (!pgstat_track_counts)
 {
 if (rel->pgstat_info)
 pgstat_unlink_relation(rel);
 /* We're not counting at all */
 rel->pgstat_enabled = false;
 rel->pgstat_info = NULL;
 return;
 }
I think I can try yo add an external parameter in the relation like ext_vacuum_pgstat_info and determine its values depending on the guc's pgstat_track_vacuum_statistics value.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Alena Rybakina

Date:

27 February, 21:00:29

Hi!
On 17.02.2025 17:46, Alena Rybakina wrote:
> On 04.02.2025 18:22, Alena Rybakina wrote:
>> Hi! Thank you for your review!
>>
>> On 02.02.2025 23:43, Alexander Korotkov wrote:
>>> On Mon, Jan 13, 2025 at 3:26 PM Alena Rybakina
>>> <a.rybakina@postgrespro.ru> wrote:
>>>> I noticed that the cfbot is bad, the reason seems to be related to 
>>>> the lack of a parameter in 
>>>> src/backend/utils/misc/postgresql.conf.sample. I added it, it 
>>>> should help.
>>> The patch doesn't apply cleanly.  Please rebase.
>> I rebased them.
> The patch needed a rebase again. There is nothing new since version 
> 18, only a rebase.

The patch needed a new rebase.

Sorry, but due to feeling unwell I picked up a patch from another thread 
and squashed the patch for vacuum statistics for indexes and heaps in 
the previous version. Now I fixed everything together with the rebase.

-- 
Regards,
Alena Rybakina
Postgres Professional

Attachment

Re: Vacuum statistics

From

Ilia Evdokimov

Date:

10 March, 12:13:08

Hi,

After commit eaf5027 we should add information about wal_buffers_full.

Any thoughts?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Vacuum statistics

From

Kirill Reshke

Date:

10 March, 16:33:15

On Thu, 27 Feb 2025 at 23:00, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>
> Hi!
> On 17.02.2025 17:46, Alena Rybakina wrote:
> > On 04.02.2025 18:22, Alena Rybakina wrote:
> >> Hi! Thank you for your review!
> >>
> >> On 02.02.2025 23:43, Alexander Korotkov wrote:
> >>> On Mon, Jan 13, 2025 at 3:26 PM Alena Rybakina
> >>> <a.rybakina@postgrespro.ru> wrote:
> >>>> I noticed that the cfbot is bad, the reason seems to be related to
> >>>> the lack of a parameter in
> >>>> src/backend/utils/misc/postgresql.conf.sample. I added it, it
> >>>> should help.
> >>> The patch doesn't apply cleanly.  Please rebase.
> >> I rebased them.
> > The patch needed a rebase again. There is nothing new since version
> > 18, only a rebase.
>
> The patch needed a new rebase.
>
> Sorry, but due to feeling unwell I picked up a patch from another thread
> and squashed the patch for vacuum statistics for indexes and heaps in
> the previous version. Now I fixed everything together with the rebase.
>
> --
> Regards,
> Alena Rybakina
> Postgres Professional

Hi!
CI fails on this one[0]

Is it a flap or a real problem caused by v20?

```

 SELECT relpages AS irp
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/vacuum_tables_and_db_statistics.out
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/vacuum_tables_and_db_statistics.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/vacuum_tables_and_db_statistics.out
2025-03-10 09:36:10.274799176 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/vacuum_tables_and_db_statistics.out
2025-03-10 09:49:35.796596462 +0000
@@ -65,7 +65,7 @@
 WHERE vt.relname = 'vestat' AND vt.relid = c.oid;
  relname | vm_new_frozen_pages | tuples_deleted | relpages |
pages_scanned | pages_removed
 ---------+---------------------+----------------+----------+---------------+---------------
- vestat  |                   0 |              0 |      455 |
   0 |             0
+ vestat  |                   0 |              0 |      417 |
   0 |             0
 (1 row)

 SELECT relpages AS rp
=== EOF ===


```

[0]
https://api.cirrus-ci.com/v1/artifact/task/5336493629112320/testrun/build-32/testrun/recovery/027_stream_regress/log/regress_log_027_stream_regress

--
Best regards,
Kirill Reshke

Re: Vacuum statistics

From

Alena Rybakina

Date:

12 March, 22:36:39

Hi!

On 10.03.2025 16:33, Kirill Reshke wrote:

On Thu, 27 Feb 2025 at 23:00, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:

Hi!
On 17.02.2025 17:46, Alena Rybakina wrote:

On 04.02.2025 18:22, Alena Rybakina wrote:

Hi! Thank you for your review!

On 02.02.2025 23:43, Alexander Korotkov wrote:

On Mon, Jan 13, 2025 at 3:26 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:

I noticed that the cfbot is bad, the reason seems to be related to
the lack of a parameter in
src/backend/utils/misc/postgresql.conf.sample. I added it, it
should help.

The patch doesn't apply cleanly.  Please rebase.

I rebased them.

The patch needed a rebase again. There is nothing new since version
18, only a rebase.

The patch needed a new rebase.

Sorry, but due to feeling unwell I picked up a patch from another thread
and squashed the patch for vacuum statistics for indexes and heaps in
the previous version. Now I fixed everything together with the rebase.

--
Regards,
Alena Rybakina
Postgres Professional

Hi!
CI fails on this one[0]

Is it a flap or a real problem caused by v20?

```
 SELECT relpages AS irp
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/vacuum_tables_and_db_statistics.out
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/vacuum_tables_and_db_statistics.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/vacuum_tables_and_db_statistics.out
2025-03-10 09:36:10.274799176 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/vacuum_tables_and_db_statistics.out
2025-03-10 09:49:35.796596462 +0000
@@ -65,7 +65,7 @@ WHERE vt.relname = 'vestat' AND vt.relid = c.oid;  relname | vm_new_frozen_pages | tuples_deleted | relpages |
pages_scanned | pages_removed ---------+---------------------+----------------+----------+---------------+---------------
- vestat  |                   0 |              0 |      455 |   0 |             0
+ vestat  |                   0 |              0 |      417 |   0 |             0 (1 row)
 SELECT relpages AS rp
=== EOF ===


```

[0] https://api.cirrus-ci.com/v1/artifact/task/5336493629112320/testrun/build-32/testrun/recovery/027_stream_regress/log/regress_log_027_stream_regress

Thank you for your help, I'll fix it soon.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Alena Rybakina

Date:

12 March, 22:41:33

Hi!

On 10.03.2025 12:13, Ilia Evdokimov wrote:
> Hi,
>
> After commit eaf5027 we should add information about wal_buffers_full.
>
> Any thoughts?
>
> -- 
> Best regards,
> Ilia Evdokimov,
> Tantor Labs LLC.
>
I think I can add it. To be honest, I haven't gotten to know this 
statistic yet, haven't had time to get to know this commit yet. How will 
this statistic help us analyze the work of the vacuum for relations?

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Jim Nasby

Date:

13 March, 01:15:53

On Wed, Mar 12, 2025 at 2:41 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:

Hi!

On 10.03.2025 12:13, Ilia Evdokimov wrote:
> Hi,
>
> After commit eaf5027 we should add information about wal_buffers_full.
>
> Any thoughts?
>
> --
> Best regards,
> Ilia Evdokimov,
> Tantor Labs LLC.
>
I think I can add it. To be honest, I haven't gotten to know this
statistic yet, haven't had time to get to know this commit yet. How will
this statistic help us analyze the work of the vacuum for relations?

The usecase I can see here is that we don't want autovac creating so much WAL traffic that it starts forcing other backends to have to write WAL out. But tracking how many times autovac writes WAL buffers won't help with that (though we also don't want any WAL buffers written by autovac to be counted in the system-wide wal_buffers_full: autovac is a BG process and it's fine if it's spending time writing WAL out). I think the same is true of a manual vacuum as well.

What would be helpful would be a way to determine if autovac was causing enough traffic to force other backends to write WAL. Offhand I'm not sure how practical that actually is though.

BTW, there's also an argument to be made that autovac should throttle itself if we're close to running out of available WAL buffers...

Re: Vacuum statistics

From

Bertrand Drouvot

Date:

13 March, 09:42:57

Hi,

On Wed, Mar 12, 2025 at 05:15:53PM -0500, Jim Nasby wrote:
> The usecase I can see here is that we don't want autovac creating so much
> WAL traffic that it starts forcing other backends to have to write WAL out.
> But tracking how many times autovac writes WAL buffers won't help with that

Right, because the one that increments the wal_buffers_full metric could "just"
be a victim (i.e the one that happens to trigger the WAL buffers disk flush,
even though other backends contributed most of the buffer usage).

> (though we also don't want any WAL buffers written by autovac to be counted
> in the system-wide wal_buffers_full:

why? Or do you mean that it would be good to have 2 kinds of metrics: one
generated by "maintenance" activity and one by "regular" backends?

> What would be helpful would be a way to determine if autovac was causing
> enough traffic to force other backends to write WAL. Offhand I'm not sure
> how practical that actually is though.

a051e71e28a could help to see how much WAL has by written by the autovac workers.

> BTW, there's also an argument to be made that autovac should throttle
> itself if we're close to running out of available WAL buffers...

hmm, yeah I think that's an interesting idea OTOH that would mean to "delegate"
the WAL buffers flush to another backend.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Vacuum statistics

From

vignesh C

Date:

17 March, 09:42:48

On Thu, 27 Feb 2025 at 23:30, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>
> Hi!
> On 17.02.2025 17:46, Alena Rybakina wrote:
> > On 04.02.2025 18:22, Alena Rybakina wrote:
> >> Hi! Thank you for your review!
> >>
> >> On 02.02.2025 23:43, Alexander Korotkov wrote:
> >>> On Mon, Jan 13, 2025 at 3:26 PM Alena Rybakina
> >>> <a.rybakina@postgrespro.ru> wrote:
> >>>> I noticed that the cfbot is bad, the reason seems to be related to
> >>>> the lack of a parameter in
> >>>> src/backend/utils/misc/postgresql.conf.sample. I added it, it
> >>>> should help.
> >>> The patch doesn't apply cleanly.  Please rebase.
> >> I rebased them.
> > The patch needed a rebase again. There is nothing new since version
> > 18, only a rebase.
>
> The patch needed a new rebase.

I noticed that the CI failure reported at [1], Ilia's comment from
[2], changed the status to Waiting on Author, please address them and
update it to Needs review.
[1] - https://www.postgresql.org/message-id/CALdSSPiw_-0_L3YV%3DQn7oopPqY2XVrXwDSGLdSXS69QvMdXisQ%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/47a7b784-5218-43f2-96e3-65f9a729c5a5%40tantorlabs.com

Regards,
Vignesh

Re: Vacuum statistics

From

Alena Rybakina

Date:

18 March, 08:57:40

On 17.03.2025 09:42, vignesh C wrote:

On Thu, 27 Feb 2025 at 23:30, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:

Hi!
On 17.02.2025 17:46, Alena Rybakina wrote:

On 04.02.2025 18:22, Alena Rybakina wrote:

Hi! Thank you for your review!

On 02.02.2025 23:43, Alexander Korotkov wrote:

On Mon, Jan 13, 2025 at 3:26 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:

I noticed that the cfbot is bad, the reason seems to be related to
the lack of a parameter in
src/backend/utils/misc/postgresql.conf.sample. I added it, it
should help.

The patch doesn't apply cleanly.  Please rebase.

I rebased them.

The patch needed a rebase again. There is nothing new since version
18, only a rebase.

The patch needed a new rebase.

I noticed that the CI failure reported at [1], Ilia's comment from
[2], changed the status to Waiting on Author, please address them and
update it to Needs review.
[1] - https://www.postgresql.org/message-id/CALdSSPiw_-0_L3YV%3DQn7oopPqY2XVrXwDSGLdSXS69QvMdXisQ%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/47a7b784-5218-43f2-96e3-65f9a729c5a5%40tantorlabs.com

Okay, thank you!

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Alena Rybakina

Date:

21 March, 22:42:17

On 13.03.2025 09:42, Bertrand Drouvot wrote:

Hi,

On Wed, Mar 12, 2025 at 05:15:53PM -0500, Jim Nasby wrote:

The usecase I can see here is that we don't want autovac creating so much
WAL traffic that it starts forcing other backends to have to write WAL out.
But tracking how many times autovac writes WAL buffers won't help with that

Right, because the one that increments the wal_buffers_full metric could "just"
be a victim (i.e the one that happens to trigger the WAL buffers disk flush,
even though other backends contributed most of the buffer usage).

(though we also don't want any WAL buffers written by autovac to be counted
in the system-wide wal_buffers_full:

why? Or do you mean that it would be good to have 2 kinds of metrics: one
generated by "maintenance" activity and one by "regular" backends?

What would be helpful would be a way to determine if autovac was causing
enough traffic to force other backends to write WAL. Offhand I'm not sure
how practical that actually is though.

a051e71e28a could help to see how much WAL has by written by the autovac workers.

BTW, there's also an argument to be made that autovac should throttle
itself if we're close to running out of available WAL buffers...

hmm, yeah I think that's an interesting idea OTOH that would mean to "delegate"
the WAL buffers flush to another backend.

Regards,

I will add it and fix the tests but later and I'll explain why.

I'm working on this issue [0] and try have already created new statistics in Statistics Collector to store database and relation vacuum statistics: PGSTAT_KIND_VACUUM_DB and PGSTAT_KIND_VACUUM_RELATION.

Vacuum statistics are saved there instead of relation's and database's statistic structure, but for some reason it is not possible to find them in the hash table when building a snapshot and display them accordingly.
I have not yet figured out where the error is.

Without solving this problem, committing vacuum statistics is not yet possible. An alternative way for us was to refuse some statistics for now for relations,
but we could not agree on which statistics should not be displayed yet and for now we are only adding them :).

I understand why this is important to display more vacuum information about vacuum statistics - it will allow us to better understand the problems of incorrect vacuum settings or, for example, notice a bug in its operation.

In order to reduce the memory consumption for storing them for those who are not going to use them, I just realized that we need to create a separate space for storing the statistics
I mentioned above (PGSTAT_KIND_VACUUM_DB and PGSTAT_KIND_VACUUM_RELATION), there is no other way to do this and I am still trying to complete this functionality.

I doubt that I will have time for this by code freeze date and even if I do, I will hardly have time for a normal review. There's really a lot more to learn related to the stat collector, so
I'm postponing it to the next commitfest.

Sorry. I'll fix the tests as soon as I finish this part, since they'll most likely either break the same way or in some new way.

Tomorrow or the day after tomorrow I will send a diff patch with what I have already managed to demonstrate the problem, since I need to bring the code to a normal form.
Maybe someone who worked with the stat collector will suddenly tell me where and what I have implemented incorrectly.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Alena Rybakina

Date:

21 March, 22:46:41

Sorry, I forgot to provide a link to the problem [0], actually. So I provided it below.

[0] https://www.postgresql.org/message-id/CAPpHfduoJEuoixPTTg2tjhnXqrdobuMaQGxriqxJ9TjN1uxOuA%40mail.gmail.com

On 21.03.2025 22:42, Alena Rybakina wrote:

On 13.03.2025 09:42, Bertrand Drouvot wrote:
Hi,

On Wed, Mar 12, 2025 at 05:15:53PM -0500, Jim Nasby wrote:
The usecase I can see here is that we don't want autovac creating so much
WAL traffic that it starts forcing other backends to have to write WAL out.
But tracking how many times autovac writes WAL buffers won't help with that
Right, because the one that increments the wal_buffers_full metric could "just"
be a victim (i.e the one that happens to trigger the WAL buffers disk flush,
even though other backends contributed most of the buffer usage).
(though we also don't want any WAL buffers written by autovac to be counted
in the system-wide wal_buffers_full:
why? Or do you mean that it would be good to have 2 kinds of metrics: one
generated by "maintenance" activity and one by "regular" backends?
What would be helpful would be a way to determine if autovac was causing
enough traffic to force other backends to write WAL. Offhand I'm not sure
how practical that actually is though.
a051e71e28a could help to see how much WAL has by written by the autovac workers.
BTW, there's also an argument to be made that autovac should throttle
itself if we're close to running out of available WAL buffers...
hmm, yeah I think that's an interesting idea OTOH that would mean to "delegate"
the WAL buffers flush to another backend.

Regards,
I will add it and fix the tests but later and I'll explain why.
I'm working on this issue [0] and try have already created new statistics in Statistics Collector to store database and relation vacuum statistics: PGSTAT_KIND_VACUUM_DB and PGSTAT_KIND_VACUUM_RELATION.
Vacuum statistics are saved there instead of relation's and database's statistic structure, but for some reason it is not possible to find them in the hash table when building a snapshot and display them accordingly.
I have not yet figured out where the error is.
Without solving this problem, committing vacuum statistics is not yet possible. An alternative way for us was to refuse some statistics for now for relations,
but we could not agree on which statistics should not be displayed yet and for now we are only adding them :).
I understand why this is important to display more vacuum information about vacuum statistics - it will allow us to better understand the problems of incorrect vacuum settings or, for example, notice a bug in its operation.
In order to reduce the memory consumption for storing them for those who are not going to use them, I just realized that we need to create a separate space for storing the statistics
I mentioned above (PGSTAT_KIND_VACUUM_DB and PGSTAT_KIND_VACUUM_RELATION), there is no other way to do this and I am still trying to complete this functionality.
I doubt that I will have time for this by code freeze date and even if I do, I will hardly have time for a normal review. There's really a lot more to learn related to the stat collector, so
I'm postponing it to the next commitfest.
Sorry. I'll fix the tests as soon as I finish this part, since they'll most likely either break the same way or in some new way.
Tomorrow or the day after tomorrow I will send a diff patch with what I have already managed to demonstrate the problem, since I need to bring the code to a normal form.
Maybe someone who worked with the stat collector will suddenly tell me where and what I have implemented incorrectly.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Jim Nasby

Date:

25 March, 02:02:31

On Fri, Mar 21, 2025 at 2:42 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:

On 13.03.2025 09:42, Bertrand Drouvot wrote:

Hi,

On Wed, Mar 12, 2025 at 05:15:53PM -0500, Jim Nasby wrote:

The usecase I can see here is that we don't want autovac creating so much
WAL traffic that it starts forcing other backends to have to write WAL out.
But tracking how many times autovac writes WAL buffers won't help with that

Right, because the one that increments the wal_buffers_full metric could "just"
be a victim (i.e the one that happens to trigger the WAL buffers disk flush,
even though other backends contributed most of the buffer usage).

(though we also don't want any WAL buffers written by autovac to be counted
in the system-wide wal_buffers_full:

why? Or do you mean that it would be good to have 2 kinds of metrics: one
generated by "maintenance" activity and one by "regular" backends?

See below...

What would be helpful would be a way to determine if autovac was causing
enough traffic to force other backends to write WAL. Offhand I'm not sure
how practical that actually is though.

a051e71e28a could help to see how much WAL has by written by the autovac workers.

I still don't think that helps (see below)

BTW, there's also an argument to be made that autovac should throttle
itself if we're close to running out of available WAL buffers...

hmm, yeah I think that's an interesting idea OTOH that would mean to "delegate"
the WAL buffers flush to another backend.

Maybe it does, maybe it doesn't... but now I think you're getting why I'm complaining about the proposed WAL flush metrics: who *flushes* WAL tells you absolutely nothing about who generated the WAL. Not only that, but flushing WAL isn't necessarily even bad: a user backend can't COMMIT without flushing some amount of WAL (ignoring async-commit of course). That really casts the whole idea of having stats on who's flushing how much WAL in a new light: you can NOT use any such metric without a bunch of other context; including who else was flushing how much WAL, whether WAL had to absolutely be flushed anyway (ie, at bare minimum a COMMIT must flush enough WAL to cover the commit record), and even where all the WAL is coming from in the first place.

Though now that I think about it... if we're reporting how much WAL is being generated by vacuum, then *maybe* it's helpful to also report how much WAL is being flushed by vacuum. My emphasis on *maybe* is because it's fine if autovac is writing more than it flushes, so long as the remainder is being flushed by the checkpointer and not user backends... but you could also determine that just by looking at how much WAL backends are flushing.

Basically, I'm leaning towards it would be best to rethink the whole purpose of reporting WAL flush metrics before we further muddy the waters by adding vacuum stats about it. At minimum we should have a metric that shows how much WAL backends flushed because they *had* to due to synchronous commit settings (which does affect more than just COMMIT).

I will add it and fix the tests but later and I'll explain why.
I'm working on this issue [0] and try have already created new statistics in Statistics Collector to store database and relation vacuum statistics: PGSTAT_KIND_VACUUM_DB and PGSTAT_KIND_VACUUM_RELATION.
Vacuum statistics are saved there instead of relation's and database's statistic structure, but for some reason it is not possible to find them in the hash table when building a snapshot and display them accordingly.
I have not yet figured out where the error is.
Without solving this problem, committing vacuum statistics is not yet possible. An alternative way for us was to refuse some statistics for now for relations,
but we could not agree on which statistics should not be displayed yet and for now we are only adding them :).
I understand why this is important to display more vacuum information about vacuum statistics - it will allow us to better understand the problems of incorrect vacuum settings or, for example, notice a bug in its operation.
In order to reduce the memory consumption for storing them for those who are not going to use them, I just realized that we need to create a separate space for storing the statistics
I mentioned above (PGSTAT_KIND_VACUUM_DB and PGSTAT_KIND_VACUUM_RELATION), there is no other way to do this and I am still trying to complete this functionality.
I doubt that I will have time for this by code freeze date and even if I do, I will hardly have time for a normal review. There's really a lot more to learn related to the stat collector, so
I'm postponing it to the next commitfest.
Sorry. I'll fix the tests as soon as I finish this part, since they'll most likely either break the same way or in some new way.
Tomorrow or the day after tomorrow I will send a diff patch with what I have already managed to demonstrate the problem, since I need to bring the code to a normal form.
Maybe someone who worked with the stat collector will suddenly tell me where and what I have implemented incorrectly.
-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Alena Rybakina

Date:

25 March, 09:12:57

Hi! I rebased the patches again - PGSTAT_FILE_FORMAT_ID needed to be fixed.

On 21.03.2025 22:42, Alena Rybakina wrote:

I will add it and fix the tests but later and I'll explain why.
I'm working on this issue [0] and try have already created new statistics in Statistics Collector to store database and relation vacuum statistics: PGSTAT_KIND_VACUUM_DB and PGSTAT_KIND_VACUUM_RELATION.
Vacuum statistics are saved there instead of relation's and database's statistic structure, but for some reason it is not possible to find them in the hash table when building a snapshot and display them accordingly.
I have not yet figured out where the error is.
Without solving this problem, committing vacuum statistics is not yet possible. An alternative way for us was to refuse some statistics for now for relations,
but we could not agree on which statistics should not be displayed yet and for now we are only adding them :).
I understand why this is important to display more vacuum information about vacuum statistics - it will allow us to better understand the problems of incorrect vacuum settings or, for example, notice a bug in its operation.
In order to reduce the memory consumption for storing them for those who are not going to use them, I just realized that we need to create a separate space for storing the statistics
I mentioned above (PGSTAT_KIND_VACUUM_DB and PGSTAT_KIND_VACUUM_RELATION), there is no other way to do this and I am still trying to complete this functionality.
I doubt that I will have time for this by code freeze date and even if I do, I will hardly have time for a normal review. There's really a lot more to learn related to the stat collector, so
I'm postponing it to the next commitfest.
Sorry. I'll fix the tests as soon as I finish this part, since they'll most likely either break the same way or in some new way.
Tomorrow or the day after tomorrow I will send a diff patch with what I have already managed to demonstrate the problem, since I need to bring the code to a normal form.
Maybe someone who worked with the stat collector will suddenly tell me where and what I have implemented incorrectly.

I attached also diff version that contains what I was talking about. The test case:

create table t (x int);

insert into t select id from generate_series(1,1000) id;

delete from t where id > 900;

vacuum;

select * from pg_stat_vacuum_tables where relname = 't';

-- 
Regards,
Alena Rybakina
Postgres Professional

Attachment

Re: Vacuum statistics

From

Andrei Lepikhov

Date:

22 April, 21:23:57

On 10/28/24 14:40, Alexander Korotkov wrote:
> On Sun, Aug 25, 2024 at 6:59 PM Alena Rybakina
>> If I missed something or misunderstood, can you explain in more detail?
> 
> Actually, I mean why do we need a possibility to return statistics for
> all tables/indexes in one function call?  User anyway is supposed to
> use pg_stat_vacuum_indexes/pg_stat_vacuum_tables view, which do
> function calls one per relation.  I suppose we can get rid of
> possibility to get all the objects in one function call and just
> return a tuple from the functions like other pgstatfuncs.c functions
> do.
I suppose it was designed this way because databases may contain 
thousands of tables and indexes - remember, at least, partitions. But it 
may be okay to use the SRF_FIRSTCALL_INIT / SRF_RETURN_NEXT API. I think 
by registering a prosupport routine predicting cost and rows of these 
calls, we may let the planner build adequate plans for queries involving 
those stats - people will definitely join it with something else in the 
database.

-- 
regards, Andrei Lepikhov

Re: Vacuum statistics

From

Alena Rybakina

Date:

13 May, 12:49:02

Hi!

On 12.05.2025 08:30, Amit Kapila wrote:
> On Fri, May 9, 2025 at 5:34 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>> I did a rebase and finished the part with storing statistics separately from the relation statistics - now it is
possibleto disable the collection of statistics for relationsh using gucs and
 
>> this allows us to solve the problem with the memory consumed.
>>
> I think this patch is trying to collect data similar to what we do for
> pg_stat_statements for SQL statements. So, can't we follow a similar
> idea such that these additional statistics will be collected once some
> external module like pg_stat_statements is enabled? That module should
> be responsible for accumulating and resetting the data, so we won't
> have this memory consumption issue.
The idea is good, it will require one hook for the pgstat_report_vacuum 
function, the extvac_stats_start and extvac_stats_end functions can be 
run if the extension is loaded, so as not to add more hooks.
But I see a problem here with tracking deleted objects for which 
statistics are no longer needed. There are two solutions to this and I 
don't like both of them, to be honest.
The first way is to add a background process that will go through the 
table with saved statistics and check whether the relation or the 
database are relevant now or not and if not, then
delete the vacuum statistics information for it. This may be 
resource-intensive. The second way is to add hooks for deleting the 
database and relationships (functions dropdb, index_drop, 
heap_drop_with_catalog).
> BTW, how will these new statistics be used to autotune a vacuum?
yes, but they are collected on demand - by guc.
> And
> do we need all the statistics proposed by this patch?
>
Regarding this issue, it was discussed here and so far we have come to 
the conclusion that statistics are needed for a deep understanding of 
the work of vacuum statistics [0] [1] [2].

[0] 
https://www.postgresql.org/message-id/0B6CBF4C-CC2A-4200-9126-CE3A390D938B%40upgrade.com

[1] 
https://www.postgresql.org/message-id/6732acf8ce0f31025b535ae1a64568750924a887.camel%40moonset.ru

[2] 
https://www.postgresql.org/message-id/5AA8FFD5-6DE2-4A31-8E00-AE98F738F5D1%40upgrade.com


-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

From

Alexander Korotkov

Date:

02 June, 19:25:28

On Tue, May 13, 2025 at 12:49 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
> On 12.05.2025 08:30, Amit Kapila wrote:
> > On Fri, May 9, 2025 at 5:34 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
> >> I did a rebase and finished the part with storing statistics separately from the relation statistics - now it is
possibleto disable the collection of statistics for relationsh using gucs and 
> >> this allows us to solve the problem with the memory consumed.
> >>
> > I think this patch is trying to collect data similar to what we do for
> > pg_stat_statements for SQL statements. So, can't we follow a similar
> > idea such that these additional statistics will be collected once some
> > external module like pg_stat_statements is enabled? That module should
> > be responsible for accumulating and resetting the data, so we won't
> > have this memory consumption issue.
> The idea is good, it will require one hook for the pgstat_report_vacuum
> function, the extvac_stats_start and extvac_stats_end functions can be
> run if the extension is loaded, so as not to add more hooks.

+1
Nice idea of a hook.  Given the volume of the patch, it might be a
good idea to keep this as an extension.

> But I see a problem here with tracking deleted objects for which
> statistics are no longer needed. There are two solutions to this and I
> don't like both of them, to be honest.
> The first way is to add a background process that will go through the
> table with saved statistics and check whether the relation or the
> database are relevant now or not and if not, then
> delete the vacuum statistics information for it. This may be
> resource-intensive. The second way is to add hooks for deleting the
> database and relationships (functions dropdb, index_drop,
> heap_drop_with_catalog).

Can we workaround this with object_access_hook?

------
Regards,
Alexander Korotkov
Supabase

Re: Vacuum statistics

From

Alena Rybakina

Date:

02 June, 22:56:41

On 02.06.2025 19:25, Alexander Korotkov wrote:
> On Tue, May 13, 2025 at 12:49 PM Alena Rybakina
> <a.rybakina@postgrespro.ru> wrote:
>> On 12.05.2025 08:30, Amit Kapila wrote:
>>> On Fri, May 9, 2025 at 5:34 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>>>> I did a rebase and finished the part with storing statistics separately from the relation statistics - now it is
possibleto disable the collection of statistics for relationsh using gucs and
 
>>>> this allows us to solve the problem with the memory consumed.
>>>>
>>> I think this patch is trying to collect data similar to what we do for
>>> pg_stat_statements for SQL statements. So, can't we follow a similar
>>> idea such that these additional statistics will be collected once some
>>> external module like pg_stat_statements is enabled? That module should
>>> be responsible for accumulating and resetting the data, so we won't
>>> have this memory consumption issue.
>> The idea is good, it will require one hook for the pgstat_report_vacuum
>> function, the extvac_stats_start and extvac_stats_end functions can be
>> run if the extension is loaded, so as not to add more hooks.
> +1
> Nice idea of a hook.  Given the volume of the patch, it might be a
> good idea to keep this as an extension.

Today, I finalized the vacuum statistics separation approach and 
refactored the vacuum statistics structures (patch 4).

I also reworked the table statistics to avoid mixing index statistics in 
parallel vacuum mode (patch 2).

The new approach excludes buffer usage and WAL statistics for indexes 
from the table’s statistics.
For timing, if vacuuming is sequential, the total time spent on all 
indexes is subtracted from the table’s total vacuum time by adding up 
the individual index vacuum times. If vacuuming is parallel, the total 
index vacuum time is subtracted as a whole.

static void
accumulate_idxs_vacuum_statistics(LVRelState *vacrel, ExtVacReport 
*extVacIdxStats)
{
     if (!pgstat_track_vacuum_statistics)
         return;

     /* Fill heap-specific extended stats fields */
     vacrel->extVacReportIdx.blk_read_time += extVacIdxStats->blk_read_time;
     vacrel->extVacReportIdx.blk_write_time += 
extVacIdxStats->blk_write_time;
     vacrel->extVacReportIdx.total_blks_dirtied += 
extVacIdxStats->total_blks_dirtied;
     vacrel->extVacReportIdx.total_blks_hit += 
extVacIdxStats->total_blks_hit;
     vacrel->extVacReportIdx.total_blks_read += 
extVacIdxStats->total_blks_read;
     vacrel->extVacReportIdx.total_blks_written += 
extVacIdxStats->total_blks_written;
     vacrel->extVacReportIdx.wal_bytes += extVacIdxStats->wal_bytes;
     vacrel->extVacReportIdx.wal_fpi += extVacIdxStats->wal_fpi;
     vacrel->extVacReportIdx.wal_records += extVacIdxStats->wal_records;
     vacrel->extVacReportIdx.delay_time += extVacIdxStats->delay_time;

     vacrel->extVacReportIdx.total_time += extVacIdxStats->total_time;

}

if (ParallelVacuumIsActive(vacrel))
{
     LVExtStatCounters counters;
     ExtVacReport extVacReport;

     memset(&extVacReport, 0, sizeof(ExtVacReport));

     extvac_stats_start(vacrel->rel, &counters);

     /* Outsource everything to parallel variant */
     parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, old_live_tuples,
vacrel->num_index_scans);

     extvac_stats_end(vacrel->rel, &counters, &extVacReport);
     accumulate_idxs_vacuum_statistics(vacrel, &extVacReport);
}

Currently, database statistics work incorrectly — I'm investigating the 
issue.


In parallel, I'm starting work on the extension.

-- 
Regards,
Alena Rybakina
Postgres Professional