Thread: [Proposal] More Vacuum Statistics

[Proposal] More Vacuum Statistics

From
Naoya Anzai
Date:
Hello, hackers!

I'm a technical support engineer of PostgreSQL.

In my much experience up until now,I have an idea that we can add
2 new vacuum statistics into pg_stat_xxx_tables.
Features I hope is following.

MORE VACUUM STATISTICS
==========================

Design & Motivation
---
1. Show about how long did vacuum spend for one table
This is a very important information for tuning vacuum & redesigning table,
but to see this information, we should check pg_log files and it cannot
show if log_autovacuum_min_duration=-1.We can already show vacuum end
time in current statistics, so we only have to add vacuum start time.
Vacuum execution time will be able to estimate by time between start
and end ( it is no longer need to check pg_log file ).
Furthermore, vacuum execution interval time will be also able to estimate.

To implement this feature, at least we need to modify pgstat_report_vacuum.

2. Page visibility rate of each table
There is no way to know how many page-bits are them of each tables stored
in their visibility maps. If we can show this information, then we will be
able to guess vacuum overhead for the table. For example, if this table is a
very big table but page visibility rate is high, then we can advise pg-users
that vacuum for this table will execute faster than they think by low I/O overhead.
Furthermore, this information can also be used in order to inform pg-users
about "real" index-only-scan usability.

To implement this feature, at least we need to count either number of
skipping or setting visible blocks at lazy_scan_heap.

I/F
---
Pg-users can show this information by select pg_stat_xxx_tables.

pg_stat_xxx_tables
-----------------------
relid
schemaname
relname
seq_scan
seq_tup_read
idx_scan
idx_tup_fetch
n_tup_ins
n_tup_upd
n_tup_del
n_tup_hot_upd
n_live_tup
n_mod_since_analyze
page_visibility----------------# add
last_vacuum_start--------------# add
last_vacuum_end----------------# rename
last_autovacuum_start----------# add
last_autovacuum_end------------# rename
last_analyze
last_autoanalyze
vacuum_count
autovacuum_count
analyze_count
autoanalyze_count

If hackers agree with my point,
I'd like to make a patch for these features.

Any comments are welcome.

Best Regards,

Naoya Anzai
---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: nao-anzai@xc.jp.nec.com       bench.coffee@gmail.com
---



Re: [Proposal] More Vacuum Statistics

From
Tom Lane
Date:
Naoya Anzai <nao-anzai@xc.jp.nec.com> writes:
> In my much experience up until now,I have an idea that we can add 
> 2 new vacuum statistics into pg_stat_xxx_tables.

Adding new stats in that way requires adding per-table counters, which
bloat the statistics files (especially in database with very many tables).
I do not think you've made a case for these stats being valuable enough
to justify such overhead for everybody.

As far as the first one goes, I don't even think it's especially useful.
There might be value in tracking the times of the last few vacuums on a
table, but knowing the time for only the latest one doesn't sound like it
would prove much.  So I'd be inclined to think more along the lines of
scanning the postmaster log for autovacuum runtimes, instead of squeezing
it into the pg_stats views.

A possible alternative so far as the second one goes is to add a function
(perhaps in contrib/pg_freespacemap) that simply runs through a table's
VM and counts the number of set bits.  This would be more accurate (no
risk of lost counter updates) and very possibly cheaper overall: it would
take longer to find out the number when you wanted it, but you wouldn't be
paying the distributed overhead of tracking it when you didn't want it.
        regards, tom lane



Re: [Proposal] More Vacuum Statistics

From
Jeff Janes
Date:
On Thu, May 28, 2015 at 4:08 AM, Naoya Anzai <nao-anzai@xc.jp.nec.com> wrote:

2. Page visibility rate of each table
There is no way to know how many page-bits are them of each tables stored
in their visibility maps. If we can show this information, then we will be
able to guess vacuum overhead for the table. For example, if this table is a
very big table but page visibility rate is high, then we can advise pg-users
that vacuum for this table will execute faster than they think by low I/O overhead.
Furthermore, this information can also be used in order to inform pg-users
about "real" index-only-scan usability.

Isn't this already pg_class.relallvisible?

 
Cheers,

Jeff

Re: [Proposal] More Vacuum Statistics

From
Jim Nasby
Date:
On 5/28/15 9:14 AM, Tom Lane wrote:
> Naoya Anzai <nao-anzai@xc.jp.nec.com> writes:
>> In my much experience up until now,I have an idea that we can add
>> 2 new vacuum statistics into pg_stat_xxx_tables.
>
> Adding new stats in that way requires adding per-table counters, which
> bloat the statistics files (especially in database with very many tables).
> I do not think you've made a case for these stats being valuable enough
> to justify such overhead for everybody.

It occurs to me that there's no good reason for vacuum-derived stats to 
be in the stats file; it's not like users run vacuum anywhere near as 
often as other commands. It's stats could be kept in pg_class; we're 
already keeping things like relallvisible there.

> As far as the first one goes, I don't even think it's especially useful.
> There might be value in tracking the times of the last few vacuums on a
> table, but knowing the time for only the latest one doesn't sound like it
> would prove much.  So I'd be inclined to think more along the lines of
> scanning the postmaster log for autovacuum runtimes, instead of squeezing
> it into the pg_stats views.

You'd also want to know how many pages were scanned, since any decent 
estimation would need to take table size into account.

As for history, that's a problem that exists for *all* our statistics, 
so anyone that cares about that is going to setup some system to 
periodically capture the contents of pg_stat_*.

> A possible alternative so far as the second one goes is to add a function
> (perhaps in contrib/pg_freespacemap) that simply runs through a table's
> VM and counts the number of set bits.  This would be more accurate (no
> risk of lost counter updates) and very possibly cheaper overall: it would
> take longer to find out the number when you wanted it, but you wouldn't be
> paying the distributed overhead of tracking it when you didn't want it.

Seems like a reasonable addition to that contrib module regardless. As 
Jeff Janes mentioned this info is available in pg_class, but it requires 
an ANALYZE to update it.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: [Proposal] More Vacuum Statistics

From
Andres Freund
Date:
On 2015-05-29 21:30:57 -0500, Jim Nasby wrote:
> It occurs to me that there's no good reason for vacuum-derived stats to be
> in the stats file; it's not like users run vacuum anywhere near as often as
> other commands. It's stats could be kept in pg_class; we're already keeping
> things like relallvisible there.

While it might be viable to store them somewhere but the stat files, I
don't think pg_class is a good place. Its size is not any less critical
than the stats files. I.e. reading it sits in several rather hot paths,
and we keep tuples from it in memory in a lot of places.

Greetings,

Andres



Re: [Proposal] More Vacuum Statistics

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2015-05-29 21:30:57 -0500, Jim Nasby wrote:
> > It occurs to me that there's no good reason for vacuum-derived stats to be
> > in the stats file; it's not like users run vacuum anywhere near as often as
> > other commands. It's stats could be kept in pg_class; we're already keeping
> > things like relallvisible there.
> 
> While it might be viable to store them somewhere but the stat files, I
> don't think pg_class is a good place. Its size is not any less critical
> than the stats files. I.e. reading it sits in several rather hot paths,
> and we keep tuples from it in memory in a lot of places.

Greg Smith had this idea about "timing events",
https://www.postgresql.org/message-id/50A4BC4E.4030007%402ndQuadrant.com
Sounds like this thread is related.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Proposal] More Vacuum Statistics

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2015-05-29 21:30:57 -0500, Jim Nasby wrote:
>> It occurs to me that there's no good reason for vacuum-derived stats to be
>> in the stats file; it's not like users run vacuum anywhere near as often as
>> other commands. It's stats could be kept in pg_class; we're already keeping
>> things like relallvisible there.

> While it might be viable to store them somewhere but the stat files, I
> don't think pg_class is a good place. Its size is not any less critical
> than the stats files. I.e. reading it sits in several rather hot paths,
> and we keep tuples from it in memory in a lot of places.

Another reason why that would be a bad place is that a pg_class update
forces relcache invalidation and thereby cached-plan invalidation.
You don't want that for anything except (1) DDL affecting the table or
(2) change in statistics that affect plan choices.  It's not random
chance that relallvisible is in pg_class while some other stats are in
the stats collector's stuff --- the planner looks at relallvisible
but not the other stuff.
        regards, tom lane



Re: [Proposal] More Vacuum Statistics

From
Tomas Vondra
Date:
Hi,

On 05/30/15 04:41, Andres Freund wrote:
> On 2015-05-29 21:30:57 -0500, Jim Nasby wrote:
>> It occurs to me that there's no good reason for vacuum-derived stats to be
>> in the stats file; it's not like users run vacuum anywhere near as often as
>> other commands. It's stats could be kept in pg_class; we're already keeping
>> things like relallvisible there.
>
> While it might be viable to store them somewhere but the stat files,
> I don't think pg_class is a good place. Its size is not any less
> critical than the stats files. I.e. reading it sits in several rather
> hot paths, and we keep tuples from it in memory in a lot of places.

IMHO stat files is exactly the right place for data like this - I can't 
really think about other place with less overhead / impact. That of 
course assumes the new fields really are useful, and I do have my doubts 
about usefulness of this data.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Proposal] More Vacuum Statistics

From
Tomas Vondra
Date:

On 05/30/15 16:47, Tom Lane wrote:
>
>
> Another reason why that would be a bad place is that a pg_class update
> forces relcache invalidation and thereby cached-plan invalidation.
> You don't want that for anything except (1) DDL affecting the table or
> (2) change in statistics that affect plan choices.  It's not random
> chance that relallvisible is in pg_class while some other stats are in
> the stats collector's stuff --- the planner looks at relallvisible
> but not the other stuff.

We already update pg_class from autovacuum - see vac_update_relstats(). 
Presumably we could update the new fields in the same way, without 
introducing any additional cache invalidations or bloat.

But I do agree pg_class really is not the right place for this for the 
other reasons.

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Proposal] More Vacuum Statistics

From
Naoya Anzai
Date:
Thank you for quick feedback, and I'm sorry for slow response.
All of your opinions were very helpful for me.

I have confirmed Greg's Idea "Timing events".
http://www.postgresql.org/message-id/509300F7.5000803@2ndQuadrant.com

Greg said at first,
"Parsing log files for commonly needed performance data is no fun."
Yes, I completely agree with him.

That looks a nice idea but I don't know why this idea has
not been commited yet. Anybody knows?

I have reworked my idea since I heard dear hacker's opinions.

====================
pg_stat_vacuum view
====================

I understand it is not good to simply add more counters in
pg_stat_*_tables. For now, I'd like to suggest an extension
which can confirm vacuum statistics like pg_stat_statements.

VACUUM is a most important feature in PostgreSQL, but a
special view for vacuum does not exist. Don't you think
the fact is inconvenience? At least, I am disgruntled with
that we need to parse pg_log for tune VACUUM.

My first design of pg_stat_vacuum view is following.
(There are two views.)

pg_stat_vacuum_table
---------------
dbid
schemaname
relid
relname
elapsed
page_removed
page_remain
page_skipped
tuple_removed
tuple_remain
tuple_notremovable
buffer_hit
buffer_miss
buffer_dirty
avg_read
avg_write
vm_count
vac_start
vac_end
is_autovacuum

pg_stat_vacuum_index
---------------
dbid
shemaname
relid
indexrelid
indexname
elapsed
num_index_tuples
num_pages
tuples_removed
pages_deleted
pages_free
is_autovacuum

At present, I think memory design of pg_stat_statements can
divert into this feature.And I think this module needs to
prepare following parameters like pg_stat_statements.

pg_stat_vacuum.max(integer)
pg_stat_vacuum.save(boolean)
pg_stat_vacuum.excluded_dbnames(text)
pg_stat_vacuum.excluded_schemas(text)
pg_stat_vacuum.min_duration(integer)
... and so on.

To implement this feature, I have to collect each vacuum-stats
every lazy_vacuum_* and I need to embed a hook function point
where needed. (probably last point of lazy_vacuum_rel).
Do you hesitate to add the hook only for this function?

Similar feature has been already provided by pg_statsinfo package.
But it is a full-stack package for PG-stats and it needs to
redesign pg_log and design a repository database for introduce.
And it is not a core-extension for PostgreSQL.
(I don't intend to hate pg_statsinfo, I think this package is a very convinient tool)

Everyone will be able to do more easily tuning of VACUUM.
That's all I want.

Any comments are welcome!

Best Regards,

Naoya Anzai

---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: nao-anzai@xc.jp.nec.com
---



Re: [Proposal] More Vacuum Statistics

From
Tomas Vondra
Date:
Hi,

On 06/05/15 14:10, Naoya Anzai wrote:
> Thank you for quick feedback, and I'm sorry for slow response.
> All of your opinions were very helpful for me.
>
> I have confirmed Greg's Idea "Timing events".
> http://www.postgresql.org/message-id/509300F7.5000803@2ndQuadrant.com
>
> Greg said at first,
> "Parsing log files for commonly needed performance data is no fun."
> Yes, I completely agree with him.>
> That looks a nice idea but I don't know why this idea has
> not been commited yet. Anybody knows?

Most likely lack of time, I guess.

>
> I have reworked my idea since I heard dear hacker's opinions.
>
> ====================
> pg_stat_vacuum view
> ====================
>
> I understand it is not good to simply add more counters in
> pg_stat_*_tables. For now, I'd like to suggest an extension
> which can confirm vacuum statistics like pg_stat_statements.

I don't see how you want to collect the necessary information from an
extension? pg_stat_statements get most of the stats from BufferUsage 
structure, but vacuum keeps all this internal, AFAIK.

So it'd be necessary to make this somehow public - either by creating 
something like BufferUsage with all the vacuum stats, or perhaps a set 
of callbacks (either a single log_relation_vacuum or different callbacks 
for tables and indexs).

IMHO the callbacks are a better idea - for example because it naturally 
handles database-wide vacuum. The global structure makes this difficult, 
because you'll only see data for all the vacuumed objects (or it'd have 
to track per-object stats internally, somehow).

> VACUUM is a most important feature in PostgreSQL, but a
> special view for vacuum does not exist. Don't you think
> the fact is inconvenience? At least, I am disgruntled with
> that we need to parse pg_log for tune VACUUM.

+1

> My first design of pg_stat_vacuum view is following.
> (There are two views.)
>
> pg_stat_vacuum_table
> ---------------
> dbid
> schemaname
> relid
> relname
> elapsed
> page_removed
> page_remain
> page_skipped
> tuple_removed
> tuple_remain
> tuple_notremovable
> buffer_hit
> buffer_miss
> buffer_dirty
> avg_read
> avg_write
> vm_count
> vac_start
> vac_end
> is_autovacuum
>
> pg_stat_vacuum_index
> ---------------
> dbid
> shemaname
> relid
> indexrelid
> indexname
> elapsed
> num_index_tuples
> num_pages
> tuples_removed
> pages_deleted
> pages_free
> is_autovacuum
>
> At present, I think memory design of pg_stat_statements can
> divert into this feature.And I think this module needs to
> prepare following parameters like pg_stat_statements.

I'm not really sure about this.

Firstly, the very fist response from TL in this thread was that adding 
per-table counters is not a particularly good idea, as it'll bloat the 
statistics files. It's true you're not adding the data into the main 
stats files, but you effectively establish a new 'vertical partition' 
with one record per table/index. It might be worth the overhead, if it 
really brings useful functionality (especially if it's opt-in feature, 
like pg_stat_statements).

Secondly, the main issue of this design IMHO is that it only tracks the 
very last vacuum run (or do I understand it wrong?). That means even if 
you snapshot the pg_stat_vacuum views, you'll not know how many vacuums 
executed in between (and the more frequently you snapshot that, the 
greater the overhead). The other stats counters have the same issue, but 
the snapshotting works a bit better because the counters are cumulative 
(so you can easily do deltas etc.). But that's not the case here - 
certainly not with the timestamps, for example.

I don't think the vacuum start/end timestamps are particularly 
interesting, TBH - we already have them in pg_stat_all_tables anyway, 
including the vacuum_count etc. So I'd propose dropping the timestamps, 
possibly replacing them with a single 'elapsed time', and making all the 
counters cumulative (so that you can do snapshots and deltas).

I'm also wondering whether this should track the vacuum costs (because 
that determines how aggressive the vacuum is, and how much work will be 
done in a particular time), if it was anti-wraparound vacuum, if there 
was also ANALYZE performed, if the autovacuum was interrupted because of 
user activity, etc.

> pg_stat_vacuum.max(integer)
> pg_stat_vacuum.save(boolean)
> pg_stat_vacuum.excluded_dbnames(text)
> pg_stat_vacuum.excluded_schemas(text)
> pg_stat_vacuum.min_duration(integer)
> ... and so on.
>
> To implement this feature, I have to collect each vacuum-stats
> every lazy_vacuum_* and I need to embed a hook function point
> where needed. (probably last point of lazy_vacuum_rel).
> Do you hesitate to add the hook only for this function?

Aha! So you plan to use the callbacks.

>
> Similar feature has been already provided by pg_statsinfo package.
> But it is a full-stack package for PG-stats and it needs to
> redesign pg_log and design a repository database for introduce.
> And it is not a core-extension for PostgreSQL.
> (I don't intend to hate pg_statsinfo,
>   I think this package is a very convinient tool)
>
> Everyone will be able to do more easily tuning of VACUUM.
> That's all I want.

I'm still wondering whether these stats will really make the tuning any 
easier. What I do right now is looking at pg_stat_all_tables.n_deat_tup 
and if it exceeds some threshold, it's a sign that vacuum may need a bit 
of tuning. Sometimes it really requires tuning vacuum itself, but more 
often than not it's due to something else (a large bulk delete, 
autovacuum getting stuck on another table, ...). I don't see how the new 
stats would make this any easier.

Can you give some examples on how the new stats might be used (and where 
the current stats are insufficient)? What use cases do you imagine for 
those stats?

It might help differentiate the autovacuum activity from the rest of the 
system (e.g. there's a lot of I/O going on - how much of that is coming 
from autovacuum workers?). This would however require a more 
fine-grained reporting, because often the vacuums run for a very long 
time, especially on very large tables (which is exactly the case when 
this might be handy) - I just had a VACUUM that ran for 12 hours. These 
jobs should report the stats incrementally, not just once at the very 
end, because that makes it rather useless IMNSHO.


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Proposal] More Vacuum Statistics

From
Naoya Anzai
Date:
Hi,

Thank you for comments. and Sorry for my late response.

>> ====================
>> pg_stat_vacuum view
>> ====================
>>
>> I understand it is not good to simply add more counters in
>> pg_stat_*_tables. For now, I'd like to suggest an extension
>> which can confirm vacuum statistics like pg_stat_statements.
>>
>> Similar feature has been already provided by pg_statsinfo package.
>> But it is a full-stack package for PG-stats and it needs to
>> redesign pg_log and design a repository database for introduce.
>> And it is not a core-extension for PostgreSQL.
>> (I don't intend to hate pg_statsinfo,
>>   I think this package is a very convinient tool)
>>
>> Everyone will be able to do more easily tuning of VACUUM.
>> That's all I want.
>
>I'm still wondering whether these stats will really make the tuning any
>easier. What I do right now is looking at pg_stat_all_tables.n_deat_tup
>and if it exceeds some threshold, it's a sign that vacuum may need a bit
>of tuning. Sometimes it really requires tuning vacuum itself, but more
>often than not it's due to something else (a large bulk delete,
>autovacuum getting stuck on another table, ...). I don't see how the new
>stats would make this any easier.
>
>Can you give some examples on how the new stats might be used (and where
>the current stats are insufficient)? What use cases do you imagine for
>those stats?

pg_stat_vacuum can keep histories of vacuum statistics for each
tables/indices into shared memory.(They are not only last vacuum.
This is already able to confirm using pg_stat_all_tables.) It makes
easier analysis of vacuum histories because this view can sort or
aggregate or filter.

My use cases for those stats are following.

- examine TRANSITION of vacuum execution time on any table(you can predict the future vacuum execution time)
- examine EXECUTION INTERVAL of vacuum for each table(if too frequent, it should make vacuum-threshold tuning to up)
- examine REST of dead-tuples just after vacuum(if dead-tuples remain, it may be due to any idle in transaction
sessions)

>
>It might help differentiate the autovacuum activity from the rest of the
>system (e.g. there's a lot of I/O going on - how much of that is coming
>from autovacuum workers?). This would however require a more
>fine-grained reporting, because often the vacuums run for a very long
>time, especially on very large tables (which is exactly the case when
>this might be handy) - I just had a VACUUM that ran for 12 hours. These
>jobs should report the stats incrementally, not just once at the very
>end, because that makes it rather useless IMNSHO.

+1

Certainly, VACUUM have often much execution time, I just had too.
At present, we cannot predict when this vacuum finishes, what this vacuum
is doing now, and whether this vacuum have any problem or not.

Maybe, For DBAs,
It might be better to show vacuum progress in pg_stat_activity.
(if we'd do, add a free-style column like "progress" ?)
This column might also be able to use for other long time commands
like ANALYZE, CREATE/RE INDEX and COPY. To realize this feature,
we certainly need to properly change pgstat_report_activity,
use it more and add a new track-activity parameter.

Regards,

Anzai Naoya
---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: nao-anzai@xc.jp.nec.com
---




Re: [Proposal] More Vacuum Statistics

From
"Syed, Rahila"
Date:
Hello,

>Maybe, For DBAs,
>It might be better to show vacuum progress in pg_stat_activity.
>(if we'd do, add a free-style column like "progress" ?) This column might also be able to use for other long time
commandslike ANALYZE, CREATE/RE INDEX and COPY. To realize this feature, we certainly need to properly change
pgstat_report_activity,use it more and add a new track-activity parameterVery similar idea was proposed in the
following
http://www.postgresql.org/message-id/1284756643.25048.42.camel@vanquo.pezone.net

IIUC, problem with showing progress in pg_stat_activity is that it introduces compulsary progress calculation overhead
incore for every command. 
As work units of each command varies, common infrastructure might not be able to represent every command progress
effectively.
An architecture which will display progress only  on users demand  for each command separately will be more efficient.
So, suggestion was rather to have a detailed progress report including "remaining time" for a command on users demand.

FWIW, I am working on designing an approach to report VACUUM progress stats for which I will be posting a detailed
proposal.
The use case is reporting progress for long running VACUUMs. The approach involves using hooks to extract VACUUM
progressstatistics . 
The progress can be displayed using psql view (ex. pg_stat_maintenance).

Thank you,
Rahila Syed


-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Naoya Anzai
Sent: Tuesday, June 16, 2015 8:41 AM
To: Tomas Vondra
Cc: pgsql-hackers@postgresql.org; Akio Iwaasa; bench.coffee@gmail.com; Tom Lane; Jeff Janes; Jim Nasby; Andres Freund;
AlvaroHerrera 
Subject: Re: [HACKERS] [Proposal] More Vacuum Statistics

Hi,

Thank you for comments. and Sorry for my late response.

>> ====================
>> pg_stat_vacuum view
>> ====================
>>
>> I understand it is not good to simply add more counters in
>> pg_stat_*_tables. For now, I'd like to suggest an extension which can
>> confirm vacuum statistics like pg_stat_statements.
>>
>> Similar feature has been already provided by pg_statsinfo package.
>> But it is a full-stack package for PG-stats and it needs to redesign
>> pg_log and design a repository database for introduce.
>> And it is not a core-extension for PostgreSQL.
>> (I don't intend to hate pg_statsinfo,
>>   I think this package is a very convinient tool)
>>
>> Everyone will be able to do more easily tuning of VACUUM.
>> That's all I want.
>
>I'm still wondering whether these stats will really make the tuning any
>easier. What I do right now is looking at pg_stat_all_tables.n_deat_tup
>and if it exceeds some threshold, it's a sign that vacuum may need a
>bit of tuning. Sometimes it really requires tuning vacuum itself, but
>more often than not it's due to something else (a large bulk delete,
>autovacuum getting stuck on another table, ...). I don't see how the
>new stats would make this any easier.
>
>Can you give some examples on how the new stats might be used (and
>where the current stats are insufficient)? What use cases do you
>imagine for those stats?

pg_stat_vacuum can keep histories of vacuum statistics for each tables/indices into shared memory.(They are not only
lastvacuum.  
This is already able to confirm using pg_stat_all_tables.) It makes easier analysis of vacuum histories because this
viewcan sort or aggregate or filter. 

My use cases for those stats are following.

- examine TRANSITION of vacuum execution time on any table  (you can predict the future vacuum execution time)
- examine EXECUTION INTERVAL of vacuum for each table  (if too frequent, it should make vacuum-threshold tuning to up)
- examine REST of dead-tuples just after vacuum  (if dead-tuples remain, it may be due to any idle in transaction
sessions)

>
>It might help differentiate the autovacuum activity from the rest of
>the system (e.g. there's a lot of I/O going on - how much of that is
>coming from autovacuum workers?). This would however require a more
>fine-grained reporting, because often the vacuums run for a very long
>time, especially on very large tables (which is exactly the case when
>this might be handy) - I just had a VACUUM that ran for 12 hours. These
>jobs should report the stats incrementally, not just once at the very
>end, because that makes it rather useless IMNSHO.

+1

Certainly, VACUUM have often much execution time, I just had too.
At present, we cannot predict when this vacuum finishes, what this vacuum is doing now, and whether this vacuum have
anyproblem or not. 

Maybe, For DBAs,
It might be better to show vacuum progress in pg_stat_activity.
(if we'd do, add a free-style column like "progress" ?) This column might also be able to use for other long time
commandslike ANALYZE, CREATE/RE INDEX and COPY. To realize this feature, we certainly need to properly change
pgstat_report_activity,use it more and add a new track-activity parameter. 

Regards,

Anzai Naoya
---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: nao-anzai@xc.jp.nec.com
---



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

______________________________________________________________________
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.