Re: [PROPOSAL] VACUUM Progress Checker. - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: [PROPOSAL] VACUUM Progress Checker.
Date
Msg-id CAFj8pRAYFqY5UBye94cwvsca9Z-rmn9LnaNAQ4mjK4mXgTD2jg@mail.gmail.com
Whole thread Raw
In response to [PROPOSAL] VACUUM Progress Checker.  (Rahila Syed <rahilasyed90@gmail.com>)
Responses Re: [PROPOSAL] VACUUM Progress Checker.  (Simon Riggs <simon@2ndQuadrant.com>)
Re: [PROPOSAL] VACUUM Progress Checker.  (Rahila Syed <rahilasyed90@gmail.com>)
List pgsql-hackers
Hi

2015-06-30 9:37 GMT+02:00 Rahila Syed <rahilasyed90@gmail.com>:
Hello Hackers,

Following is a proposal for feature to calculate VACUUM progress.

interesting idea - I like to see it integrated to core.
 

Use Case : Measuring progress of long running VACUUMs to help DBAs make informed decision 
whether to continue running VACUUM or abort it.

Design:

A shared preload library to store progress information from different backends running VACUUM, calculate remaining time for each and display progress in the
in the form a view.

probably similar idea can be used for REINDEX, CREATE INDEX, COPY TO statements

I though about the possibilities of progress visualization - and one possibility is one or two special column in pg_stat_activity table - this info can be interesting for VACUUM started by autovacuum too.

Regards

Pavel
 


VACUUM  needs to be instrumented with a hook to collect progress information (pages vacuumed/scanned) periodically.

The patch attached implements a new hook to store vacuumed_pages and scanned_pages count at the end of each page scanned by VACUUM.

This information is stored in a shared memory structure.

In addition to measuring progress this function using hook also calculates remaining time for VACUUM. 



The frequency of collecting progress information can be reduced by appending delays in between hook function calls.

Also, a GUC parameter 

log_vacuum_min_duration can be used. 

This will cause VACUUM progress to be calculated only if VACUUM runs more than specified milliseconds. 

A value of zero calculates VACUUM progress for each page processed. -1 disables logging.


Progress calculation :


percent_complete = scanned_pages * 100 / total_pages_to_be_scanned;

remaining_time = elapsed_time * (total_pages_to_be_scanned - scanned_pages) / scanned_pages;


Shared memory struct:

typedef struct PgStat_VacuumStats

{

  Oid         databaseoid;

  Oid         tableoid;

  Int32       vacuumed_pages;

  Int32       total_pages;

  Int32       scanned_pages;

  double    elapsed_time;

  double    remaining_time;

 } PgStat_VacuumStats[max_connections];

 

Reporting :

 A view named 'pg_maintenance_progress' can be created using the values in the struct above.

pg_stat_maintenance can be called from any other backend and will display progress of 

each running VACUUM.


Other uses of hook in VACUUM:


Cost of VACUUM in terms of pages hit , missed and dirtied same as autovacuum can be collected using this hook.

Autovacuum does it at the end of VACUUM for each table. It can be done while VACUUM on a table is in progress.
This can be helpful to track manual VACUUMs also not just autovacuum.

Read/Write(I/O) rates can be computed on the lines of autovacuum.
Read rate patterns can be used to help tuning future vacuum on the table(like shared buffers tuning)
Other resource usages can also be collected using progress checker hook.


Attached patch is POC patch of progress calculation for a single backend. 

Also attached is a brief snapshot of the output log.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


pgsql-hackers by date:

Previous
From: Rahila Syed
Date:
Subject: [PROPOSAL] VACUUM Progress Checker.
Next
From: Thomas Munro
Date:
Subject: Re: [PROPOSAL] VACUUM Progress Checker.