Re: Vacuum statistics - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: Vacuum statistics
Date
Msg-id 9706bb4d-8966-49d8-a836-b11107160327@yandex.ru
Whole thread Raw
In response to Re: Vacuum statistics  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Vacuum statistics
List pgsql-hackers

Hi! Thank you for your interest in this topic!

On 07.06.2024 09:46, Dilip Kumar wrote:
On Thu, May 30, 2024 at 11:57 PM Alena Rybakina
<lena.ribackina@yandex.ru> wrote:
On 30.05.2024 10:33, Alena Rybakina wrote:
I suggest gathering information about vacuum resource consumption for
processing indexes and tables and storing it in the table and index
relationships (for example, PgStat_StatTabEntry structure like it has
realized for usual statistics). It will allow us to determine how well
the vacuum is configured and evaluate the effect of overhead on the
system at the strategic level, the vacuum has gathered this
information already, but this valuable information doesn't store it.

My colleagues and I have prepared a patch that can help to solve this
problem.

We are open to feedback.
I was reading through the patch here are some initial comments.

--
+typedef struct LVExtStatCounters
+{
+ TimestampTz time;
+ PGRUsage ru;
+ WalUsage walusage;
+ BufferUsage bufusage;
+ int64 VacuumPageMiss;
+ int64 VacuumPageHit;
+ int64 VacuumPageDirty;
+ double VacuumDelayTime;
+ PgStat_Counter blocks_fetched;
+ PgStat_Counter blocks_hit;
+} LVExtStatCounters;


I noticed that you are storing both pgBufferUsage and
VacuumPage(Hit/Miss/Dirty) stats. Aren't these essentially the same?
It seems they both exist in the system because some code, like
heap_vacuum_rel(), uses pgBufferUsage, while do_analyze_rel() still
relies on the old counters. And there is already a patch to remove
those old counters.
I agree with you and I have fixed it.

--
+static Datum
+pg_stats_vacuum(FunctionCallInfo fcinfo, ExtVacReportType type, int ncolumns)
+{

I don't think you need this last parameter (ncolumns) we can anyway
fetch that from tupledesc, so adding an additional parameter
just for checking doesn't look good to me.
To be honest, I'm not sure if ncolumns should be deleted at all, because the pg_stats_vacuum function is used to display three different types of statistics: for tables, indexes, and databases. We use this parameter to pass information about the number of parameters (or how many statistics we expect) depending on the type of statistics. For example, table vacuum statistics contain 27 parameters, while indexes and databases contain 19 and 15 parameters, respectively. You can see that the pg_stats_vacuum function contains an Assert that checks that the expected number of tupledesc parameters matches the actual number.

Assert(tupdesc->natts == ncolumns);


Perhaps I can convert it to a local parameter and determine its value already in the function, for example:

pg_stats_vacuum(FunctionCallInfo fcinfo, ExtVacReportType type, int ncolumns)
{

int columns = 0;

switch (type)

{

case PGSTAT_EXTVAC_HEAP:

ncolumns = EXTVACHEAPSTAT_COLUMNS;

break;

case PGSTAT_EXTVAC_INDEX:

ncolumns = EXTVACINDEXSTAT_COLUMNS;

break;

case PGSTAT_EXTVAC_DB:

ncolumns = EXTVACDBSTAT_COLUMNS;

break; }

...

}

What do you think?

--
+ /* Tricky turn here: enforce pgstat to think that our database us dbid */
+
+ MyDatabaseId = dbid;

typo
/think that our database us dbid/think that our database has dbid

Also, remove the blank line between the comment and the next code
block that is related to that comment.


--  VacuumPageDirty = 0;
+ VacuumDelayTime = 0.;

There is an extra "." after 0


Thank you, I fixed it.


In addition to these changes, I fixed the problem with displaying vacuum statistics for databases: I found an error in defining the pg_stats_vacuum_database system view. In addition, I rewrote the tests and converted them into a regression test. In addition, I have divided the test to test the functionality of the output of vacuum statistics into two tests: one of them checks the functionality of tables and databases, and the other - indexes. This is caused by a problem with the vacuum functionality when the table contains an index. You can find more information about this here: [0] and [1].

I attached the diff to this letter.

[0] https://www.postgresql.org/message-id/d1ca3a1d-7ead-41a7-bfd0-5b66ad97b1cd%40yandex.ru

[1] https://www.postgresql.org/message-id/CAH2-Wznv94Q_Td8OS8bAN7fYLpfU6CGgjn6Xau5eJ_sDxEGeBA%40mail.gmail.com


I am currently working on dividing this patch into three parts to simplify the review process: one of them will contain code for collecting vacuum statistics on tables, the second on indexes and the last on databases. I also write the documentation.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: New function normal_rand_array function to contrib/tablefunc.
Next
From: Amit Kapila
Date:
Subject: Re: Conflict Detection and Resolution