Re: PoC: history of recent vacuum/checkpoint runs (using new hooks) - Mailing list pgsql-hackers

From wenhui qiu
Subject Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)
Date
Msg-id CAGjGUAJSg3tJGxjeyU6T98ZYCXcAiGpq0eitAUQVaOa+nULDuA@mail.gmail.com
Whole thread Raw
Responses Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)
List pgsql-hackers
Hi Tomas 
     This is a great feature.  
+ /*
+ * Define (or redefine) custom GUC variables.
+ */
+ DefineCustomIntVariable("stats_history.size",
+ "Sets the amount of memory available for past events.",
+ NULL,
+ &statsHistorySizeMB,
+ 1,
+ 1,
+ 128,
+ PGC_POSTMASTER,
+ GUC_UNIT_MB,
+ NULL,
+ NULL,
+ NULL);
+
RAM is in terabytes now, the statsHistorySize is 128MB ,I think can increase to store more history record ?

Thanks 

On Sun, Dec 22, 2024 at 4:28 AM Tomas Vondra <tomas@vondra.me> wrote:
Hi,

Our runtime stats system is great, but it only keeps a snapshot of
cumulative stats / snapshot. And while that works for user workloads, it
may not be quite sufficient when analyzing maintenance operations like
vacuum/checkpoint, etc.

For those operations it's useful to have information about individual
runs, not just the cumulative counters (or even deltas between regular
snapshots). There's also the issue that we only keep a very limited
subset of available information - just look at the info included in
VACUUM VERBOSE or with log_checkpoints=on, and how little of that is
available in pg_stats_. For vacuum we have the vacuum/analyze counts,
and timestamp of the last operation, but that's it. VACUUM VERBOSE
provides way more information, but we can only guess based on the
regular pgstat counters.

Yes, we can get this info written to server log using log_checkpoints
and log_autovacuum_min_duration (AFAIK there's no way to ensure logging
for manual VACUUM). But processing this information is not particularly
convenient, as it requires parsing the log, the message format is
suitable more for humans, etc. And it'd be very convenient to be able to
query this information, just like the other pgstat catalogs.

I wonder if we might/should do two things, to improve this:

1) Introduce hooks that allow doing some custom stuff with info about
those actions, after logging it. The attached 0001 and 0002 patches do
this for vacuum and checkpoint.

2) Allow keeping information about events. The 0003 patch does that in
an extension, leveraging the new hooks, but it'd certainly possible to
do in core too.

I realize our current pgstat collector is keeping per-object stats, not
per-event. We might add this to per-object stats (e.g. each table would
have stats about vacuum runs), but that wouldn't work for checkpoints.
There's also the question of memory consumption - I'm sure we don't want
to keep infinite history of vacuum runs, for example.

So the extension simply maintains a fixed-size circular queue, i.e. when
it gets full it starts evicting oldest entries. 1MB is enough for
storing ~4k VACUUM runs - I'm sure it can be made more compact.

I don't think there's a better way to do this. I've considered if this
might be done using emit_log_hook, but (a) that only helps when we end
up logging the event (and I'd like to do this always), and (b) it'd
require parsing the server log. So it's not much better than just doing
that, I think ...


Opinions?

--
Tomas Vondra

pgsql-hackers by date:

Previous
From: "章晨曦@易景科技"
Date:
Subject: Re: Re: Re: transaction lost when delete clog file after normal shutdown
Next
From: Tom Lane
Date:
Subject: Re: transaction lost when delete clog file after normal shutdown