Re: Vacuum statistics - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Vacuum statistics
Date
Msg-id 5AA8FFD5-6DE2-4A31-8E00-AE98F738F5D1@upgrade.com
Whole thread Raw
In response to Re: Vacuum statistics  (Andrei Zubkov <zubkov@moonset.ru>)
List pgsql-hackers
On Oct 29, 2024, at 7:40 AM, Andrei Zubkov <zubkov@moonset.ru> wrote:
>
> Hi,
>
> Thanks for your attention to our patch!
>
> On Mon, 2024-10-28 at 16:03 -0500, Jim Nasby wrote:
>>> Yes, but as Masahiko-san pointed out, PgStat_TableCounts is almost
>>> tripled in space.  That a huge change from having no statistics on
>>> vacuum to have it in much more detail than everything else we
>>> currently have.  I think the feasible way might be to introduce
>>> some
>>> most demanded statistics first then see how it goes.
>>
>> Looking at the stats I do think the WAL stats are probably not
>> helpful. First, there’s nothing users can do to tune how much WAL is
>> generated by vacuum. Second, this introduces the risk of users saying
>> “Wow, vacuum is creating a lot of WAL! I’m going to turn it down!”,
>> which is most likely to make matters worse. There’s already a lot of
>> stuff that goes into WAL without any detailed logging; if we ever
>> wanted to provide a comprehensive view of what data is in WAL that
>> should be handled separately.
>
> Yes, there is nothing we can directly do with WAL generated by vacuum,
> but WAL generation is the part of vacuum work, and it will indirectly
> affected by the changes of vacuum settings. So, WAL statistics is one
> more dimension of vacuum workload. Also WAL stat is universal metric
> which is measured cluster-wide and on the statement-level with
> pg_stat_statements. Vacuum WAL counters will explain the part of
> difference between those metrics. Besides vacuum WAL counters can be
> used to locate abnormal vacuum behavior caused by a bug or the data
> corruption. I think if the DBA is smart enough to look at vacuum WAL
> generated stats and to understand what it means, the decision to
> disable the autovacuum due to its WAL generation is unlikely.

I’m generally for more stats rather than less - really just a question of how much we’re worried about stats overhead.

> Anyway I think some stats can be excluded to save some memory. The
> first candidates are the system_time and user_time fields. Those are
> very valuable, but are measured by the rusage stats, which won't be
> available on all platforms. I think total_time and delay_time would be
> sufficient.

Yeah, I considered throwing those under the bus. I agree they’re only marginally useful.

> The second is the interrupts field. It is needed for monitoring to know
> do we have them or not, so tracking them on the database level will do
> the trick. Interrupt is quite rare event, so once the monitoring system
> will catch one the DBA can go to the server log for the details.
Just to confirm… by “interrupt” you mean vacuum encountered an error?

> It seems there is another way. If the vacuum stats doesn't seems to be
> mandatory in all systems, maybe we should add some hooks to the vacuum
> so that vacuum statistics tracking can be done in an extension. I don't
> think it is a good idea, because vacuum stats seems to me as mandatory
> as the vacuum process itself.
I’d actually like hooks for all stats, so people can develop different ways of storing/aggregating them. But I agree
that’sa separate discussion. 

>> Is there a reason some fields are omitted
>> from pg_stat_vacuum_database? While some stats are certainly more
>> interesting at the per-relation level, I can’t really think of any
>> that don’t make sense at the database level as well.
>
> Some of the metrics are table-specific, some index-specific, so we
> moved to the database level metrics more or less specific to the whole
> database. Can you tell what stats you want to see at the database
> level?

Here’s the thing with pg_stat_vacuum_database; it’s the only way to see everything in the whole cluster. So I think the
betterquestion is what metrics simply don’t make sense at that level? And I don’t really see any that don’t. 

>> For all the views the docs should clarify that total_blks_written
>> means blocks written by vacuum, as opposed to the background Ywriter.
>
> We have the "Number of database blocks written by vacuum operations
> performed on this table" in the docs now. Do you mean we should
> specifically note the vacuum process here?

The reason the stat is confusing is because it doesn’t have the meaning that the name implies. Most people that see
thiswill think it’s actually measuring blocks dirtied, or at least something closer to that. It definitely hides the
factthat many of the dirtied blocks could actually be written by the bgwriter. So an improvement to the docs would be
“Numberof blocks written directly by vacuum or auto vacuum. Blocks that are dirtied by a vacuum process can be written
outby another process.” 

Which makes me realize… I think vacuum only counts a block as dirtied if it was previously clean? If so the docs for
thatmetric need to clarify that vacuum might modify a block but not count it as having been dirtied. 

>> Similarly they should clarify the difference between
>> rel_blks_(read|hit) and total_blks_(read|hit). In the case of
>> pg_stat_vacuum_indexes it’d be better if rel_blks_(read|hit) were
>> called index_blks_(read|hit). Although… if total_blks_* is actually
>> the count across the table and all the indexes I don’t know that we
>> even need that counter. I realize that not ever vacuum even looks at
>> the indexes, but if we’re going to go into that level of detail then
>> we would (at minimum) need to count the number of times a vacuum
>> completely skipped scanning the indexes.
>
> It is not clear to me enough. The stats described just as it is -
> rel_blocks_* tracks blocks of the current heap, and total_* is for the
> whole database blocks - not just tables and indexes, vacuum do some
> work (quite a little) in the catalog and this work is counted here too.
> Usually this stat won't be helpful, but maybe we can catch unusual
> vacuum behavior using this stat.

Ok, so this just needs to be clarified in the docs by explicitly stating what is and isn’t part of the metric. It would
alsobe better not to use the term “rel” since most people don’t immediately know what that means. So,
table_blks_(read|hit)or index_blks_(read|hit). 

Also, “total” is still not clear to me, at least in the context of pg_stat_vacuum_indexes. Is that different from
pg_stat_vacuum_tables.total_blks_*?If so, how? If it’s the same then IMO it should just be removed from
pg_stat_vacuum_indexes.

>> Sadly index_vacuum_count is may not useful at all at present. At
>> minimum you’d need to know the number of times vacuum had run in
>> total. I realize that’s in pg_stat_all_tables, but that doesn’t help
>> if vacuum stats are tracked or reset separately.
>
> I'm in doubt - is it really possible to reset the vacuum stats
> independent of pg_stat_all_tables?

Most stats can be independently reset, so I was thinking these wouldn’t be an exception. If that’s not the case then I
thinkthe docs need to mention pg_stat_all_tables.(auto)vacuum_count, since it’s in a completely different view. Or
betteryet, include the vacuum/analyze related stats that are in pg_stat_all_tables in pg_stat_vacuum_tables. 

BTW, have you thought about what stats should be added for ANALYZE? That’s obviously not as critical as vacuum, but
maybeworth considering as part of this... 

>> First, there’s still gaps in trying to track HOT; most notably a
>> counter for how many updates would never be HOT eligible because they
>> modify indexes. pg_stat_all_tables.n_tup_newpage_upd is really
>> limited without that info.
>
> Nice catch, I'll think about it. Those are not directly connected to
> the vacuum workload but those are important.

Just to re-iterate: I don’t think this patch has to boil the ocean and try to handle all these extra use cases.

>> There should also be stats about unused line pointers - in degenerate
>> cases the lp array can consume a significant portion of heap storage.
>>
>> Monitoring bloat would be a lot more accurate if vacuum reported
>> total tuple length for each run along with the total number of tuples
>> it looked at. Having that info would make it trivial to calculate
>> average tuple size, which could then be applied to reltuples and
>> relpages to calculate how much space would being lost to bloat.
>
> Yes, bloat tracking is in our plans. Right now it is not clear enough
> how to do it in the most reliable and convenient way.
>
>> Autovacuum will self-terminate if it would block another process
>> (unless it’s an aggressive vacuum) - that’s definitely something that
>> should be tracked. Not just the number of times that happens, but
>> also stats about how much work was lost because of this.
>
> Agreed.
>
>> Shrinking a relation (what vacuum calls truncation, which is very
>> confusing with the truncate command) is a rather complex process that
>> currently has no visibility.
>
> In this patch table truncation can be seen in the "pages_removed" field
> of "pg_stat_vacuum_tables" at least as the cumulative number of removed
> pages. It is not clear enough, but it is visible.

Ahh, good point. I think it’s probably worth adding a counter (to this patch) for how many times vacuum actually
decidedto do page removal, because it’s (presumably) a pretty rare event. Without that counter it’s very hard to make
anysense of the number of pages removed (other than being able to see some were removed, at least once). 

>> Tuning vacuum_freeze_min_age (and the MXID variant) is rather
>> complicated. We maybe have enough stats on whether it could be set
>> lower, but there’s no visibility on how the settings affect how often
>> vacuum decides to be aggressive. At minimum, we should have stats on
>> when vacuum is aggressive, especially since it significantly changes
>> the behavior of autovac.
>
> When you say "agressive" do you mean the number of times when the
> vacuum was processing the table with the FREEZE intention? I think this
> is needed too.

Yes. I intentionally use the term “aggressive” (as the code does) to avoid confusion with the FREEZE option (which as
I’msure you know simply forces some GUCs to 0). Further complicating this is that auto vac will report this as “to
preventwraparound”… 

In any case… I’m actually leaning towards there should be a complete second set of counters for aggressive vacuums,
becauseof how differently they work. :( 

>> I saw someone else already mentioned tuning vacuum memory usage, but
>> I’ll mention it again. Even if the issues with index_vacuum_count are
>> fixed that still only tells you if you have a problem; it doesn’t
>> give you a great idea of how much more memory you need. The best you
>> can do is assuming you need (number of passes - 1) * current memory.
>
> Do you think such approach is insufficient? It seems we do not need
> byte-to-byte accuracy here.

Byte-for-byte, no. But I do wonder if there’s any way to do better than some multiple of what *_work_mem was set to.

And setting that aside, another significant problem is that you can’t actually do anything here without actually
knowingwhat memory setting was used, which is definitely not a given. Off-hand I don’t see anyway this can actually be
tuned(at all) with nothing but counters. :( 

Definitely out of scope for this patch though :)

>> Speaking of which… there should be stats on any time vacuum decided
>> on it’s own to skip index processing due to wraparound proximity.
>
> Maybe we should just count the number of times when the vacuum was
> started to prevent wraparound?

Unfortunately even that isn’t simple… auto vac and manual vac have different GUCs, and of course there’s the FREEZE
option.And then there’s the issue that MXIDs are handled completely separately. 

Even ignoring all of that… by default an aggressive vacuum won’t skip indexes. That only happens when you hit
vacuum_(multixact_)failsafe_age.

BTW, something I’ve been mulling over is what stats related to cleanup might be tracked at a system level. I’m thinking
alongthe lines of how often heap_prune_page or the index marking code come across a dead tuple they can’t do anything
aboutyet because it’s still visible. While you could track that per-relation, I’m not sure how helpful that actually is
sinceit’s really a long-running transaction problem. 

Similarly, it’d be nice if we had stats about how often all of the auto vac workers were occupied; something that’s
alsoglobal in nature. 




pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Separate memory contexts for relcache and catcache
Next
From: Andy Fan
Date:
Subject: Re: detoast datum into the given buffer as a optimization.