Re: [Proposal] More Vacuum Statistics - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [Proposal] More Vacuum Statistics
Date
Msg-id 55744E16.7030705@2ndquadrant.com
Whole thread Raw
In response to Re: [Proposal] More Vacuum Statistics  (Naoya Anzai <nao-anzai@xc.jp.nec.com>)
Responses Re: [Proposal] More Vacuum Statistics  (Naoya Anzai <nao-anzai@xc.jp.nec.com>)
List pgsql-hackers
Hi,

On 06/05/15 14:10, Naoya Anzai wrote:
> Thank you for quick feedback, and I'm sorry for slow response.
> All of your opinions were very helpful for me.
>
> I have confirmed Greg's Idea "Timing events".
> http://www.postgresql.org/message-id/509300F7.5000803@2ndQuadrant.com
>
> Greg said at first,
> "Parsing log files for commonly needed performance data is no fun."
> Yes, I completely agree with him.>
> That looks a nice idea but I don't know why this idea has
> not been commited yet. Anybody knows?

Most likely lack of time, I guess.

>
> I have reworked my idea since I heard dear hacker's opinions.
>
> ====================
> pg_stat_vacuum view
> ====================
>
> I understand it is not good to simply add more counters in
> pg_stat_*_tables. For now, I'd like to suggest an extension
> which can confirm vacuum statistics like pg_stat_statements.

I don't see how you want to collect the necessary information from an
extension? pg_stat_statements get most of the stats from BufferUsage 
structure, but vacuum keeps all this internal, AFAIK.

So it'd be necessary to make this somehow public - either by creating 
something like BufferUsage with all the vacuum stats, or perhaps a set 
of callbacks (either a single log_relation_vacuum or different callbacks 
for tables and indexs).

IMHO the callbacks are a better idea - for example because it naturally 
handles database-wide vacuum. The global structure makes this difficult, 
because you'll only see data for all the vacuumed objects (or it'd have 
to track per-object stats internally, somehow).

> VACUUM is a most important feature in PostgreSQL, but a
> special view for vacuum does not exist. Don't you think
> the fact is inconvenience? At least, I am disgruntled with
> that we need to parse pg_log for tune VACUUM.

+1

> My first design of pg_stat_vacuum view is following.
> (There are two views.)
>
> pg_stat_vacuum_table
> ---------------
> dbid
> schemaname
> relid
> relname
> elapsed
> page_removed
> page_remain
> page_skipped
> tuple_removed
> tuple_remain
> tuple_notremovable
> buffer_hit
> buffer_miss
> buffer_dirty
> avg_read
> avg_write
> vm_count
> vac_start
> vac_end
> is_autovacuum
>
> pg_stat_vacuum_index
> ---------------
> dbid
> shemaname
> relid
> indexrelid
> indexname
> elapsed
> num_index_tuples
> num_pages
> tuples_removed
> pages_deleted
> pages_free
> is_autovacuum
>
> At present, I think memory design of pg_stat_statements can
> divert into this feature.And I think this module needs to
> prepare following parameters like pg_stat_statements.

I'm not really sure about this.

Firstly, the very fist response from TL in this thread was that adding 
per-table counters is not a particularly good idea, as it'll bloat the 
statistics files. It's true you're not adding the data into the main 
stats files, but you effectively establish a new 'vertical partition' 
with one record per table/index. It might be worth the overhead, if it 
really brings useful functionality (especially if it's opt-in feature, 
like pg_stat_statements).

Secondly, the main issue of this design IMHO is that it only tracks the 
very last vacuum run (or do I understand it wrong?). That means even if 
you snapshot the pg_stat_vacuum views, you'll not know how many vacuums 
executed in between (and the more frequently you snapshot that, the 
greater the overhead). The other stats counters have the same issue, but 
the snapshotting works a bit better because the counters are cumulative 
(so you can easily do deltas etc.). But that's not the case here - 
certainly not with the timestamps, for example.

I don't think the vacuum start/end timestamps are particularly 
interesting, TBH - we already have them in pg_stat_all_tables anyway, 
including the vacuum_count etc. So I'd propose dropping the timestamps, 
possibly replacing them with a single 'elapsed time', and making all the 
counters cumulative (so that you can do snapshots and deltas).

I'm also wondering whether this should track the vacuum costs (because 
that determines how aggressive the vacuum is, and how much work will be 
done in a particular time), if it was anti-wraparound vacuum, if there 
was also ANALYZE performed, if the autovacuum was interrupted because of 
user activity, etc.

> pg_stat_vacuum.max(integer)
> pg_stat_vacuum.save(boolean)
> pg_stat_vacuum.excluded_dbnames(text)
> pg_stat_vacuum.excluded_schemas(text)
> pg_stat_vacuum.min_duration(integer)
> ... and so on.
>
> To implement this feature, I have to collect each vacuum-stats
> every lazy_vacuum_* and I need to embed a hook function point
> where needed. (probably last point of lazy_vacuum_rel).
> Do you hesitate to add the hook only for this function?

Aha! So you plan to use the callbacks.

>
> Similar feature has been already provided by pg_statsinfo package.
> But it is a full-stack package for PG-stats and it needs to
> redesign pg_log and design a repository database for introduce.
> And it is not a core-extension for PostgreSQL.
> (I don't intend to hate pg_statsinfo,
>   I think this package is a very convinient tool)
>
> Everyone will be able to do more easily tuning of VACUUM.
> That's all I want.

I'm still wondering whether these stats will really make the tuning any 
easier. What I do right now is looking at pg_stat_all_tables.n_deat_tup 
and if it exceeds some threshold, it's a sign that vacuum may need a bit 
of tuning. Sometimes it really requires tuning vacuum itself, but more 
often than not it's due to something else (a large bulk delete, 
autovacuum getting stuck on another table, ...). I don't see how the new 
stats would make this any easier.

Can you give some examples on how the new stats might be used (and where 
the current stats are insufficient)? What use cases do you imagine for 
those stats?

It might help differentiate the autovacuum activity from the rest of the 
system (e.g. there's a lot of I/O going on - how much of that is coming 
from autovacuum workers?). This would however require a more 
fine-grained reporting, because often the vacuums run for a very long 
time, especially on very large tables (which is exactly the case when 
this might be handy) - I just had a VACUUM that ran for 12 hours. These 
jobs should report the stats incrementally, not just once at the very 
end, because that makes it rather useless IMNSHO.


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Reducing tuple overhead
Next
From: Kevin Grittner
Date:
Subject: Re: [CORE] Restore-reliability mode