Thread: Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)
Hi Tomas
This is a great feature.
+ /*
+ * Define (or redefine) custom GUC variables.
+ */
+ DefineCustomIntVariable("stats_history.size",
+ "Sets the amount of memory available for past events.",
+ NULL,
+ &statsHistorySizeMB,
+ 1,
+ 1,
+ 128,
+ PGC_POSTMASTER,
+ GUC_UNIT_MB,
+ NULL,
+ NULL,
+ NULL);
+
+ * Define (or redefine) custom GUC variables.
+ */
+ DefineCustomIntVariable("stats_history.size",
+ "Sets the amount of memory available for past events.",
+ NULL,
+ &statsHistorySizeMB,
+ 1,
+ 1,
+ 128,
+ PGC_POSTMASTER,
+ GUC_UNIT_MB,
+ NULL,
+ NULL,
+ NULL);
+
RAM is in terabytes now, the statsHistorySize is 128MB ,I think can increase to store more history record ?
Thanks
On Sun, Dec 22, 2024 at 4:28 AM Tomas Vondra <tomas@vondra.me> wrote:
Hi,
Our runtime stats system is great, but it only keeps a snapshot of
cumulative stats / snapshot. And while that works for user workloads, it
may not be quite sufficient when analyzing maintenance operations like
vacuum/checkpoint, etc.
For those operations it's useful to have information about individual
runs, not just the cumulative counters (or even deltas between regular
snapshots). There's also the issue that we only keep a very limited
subset of available information - just look at the info included in
VACUUM VERBOSE or with log_checkpoints=on, and how little of that is
available in pg_stats_. For vacuum we have the vacuum/analyze counts,
and timestamp of the last operation, but that's it. VACUUM VERBOSE
provides way more information, but we can only guess based on the
regular pgstat counters.
Yes, we can get this info written to server log using log_checkpoints
and log_autovacuum_min_duration (AFAIK there's no way to ensure logging
for manual VACUUM). But processing this information is not particularly
convenient, as it requires parsing the log, the message format is
suitable more for humans, etc. And it'd be very convenient to be able to
query this information, just like the other pgstat catalogs.
I wonder if we might/should do two things, to improve this:
1) Introduce hooks that allow doing some custom stuff with info about
those actions, after logging it. The attached 0001 and 0002 patches do
this for vacuum and checkpoint.
2) Allow keeping information about events. The 0003 patch does that in
an extension, leveraging the new hooks, but it'd certainly possible to
do in core too.
I realize our current pgstat collector is keeping per-object stats, not
per-event. We might add this to per-object stats (e.g. each table would
have stats about vacuum runs), but that wouldn't work for checkpoints.
There's also the question of memory consumption - I'm sure we don't want
to keep infinite history of vacuum runs, for example.
So the extension simply maintains a fixed-size circular queue, i.e. when
it gets full it starts evicting oldest entries. 1MB is enough for
storing ~4k VACUUM runs - I'm sure it can be made more compact.
I don't think there's a better way to do this. I've considered if this
might be done using emit_log_hook, but (a) that only helps when we end
up logging the event (and I'd like to do this always), and (b) it'd
require parsing the server log. So it's not much better than just doing
that, I think ...
Opinions?
--
Tomas Vondra
Hi, On 12/23/24 07:35, wenhui qiu wrote: > Hi Tomas > This is a great feature. > + /* > + * Define (or redefine) custom GUC variables. > + */ > + DefineCustomIntVariable("stats_history.size", > + "Sets the amount of memory available for past events.", > + NULL, > + &statsHistorySizeMB, > + 1, > + 1, > + 128, > + PGC_POSTMASTER, > + GUC_UNIT_MB, > + NULL, > + NULL, > + NULL); > + > RAM is in terabytes now, the statsHistorySize is 128MB ,I think can > increase to store more history record ? > Maybe, the 128MB is an arbitrary (and conservative) limit - it's enough for ~500k events, which seems good enough for most systems. Of course, on systems with many relations might need more space, not sure. I was thinking about specifying the space in more natural terms, either as amount of time ("keep 1 day of history") or number of entries ("10k entries"). That would probably mean the memory can't be allocated as fixed size. But maybe it'd be possible to just write the entries to a file. We don't need random access to past entries (unlike e.g. pg_stat_statements), and people won't query that very often either. regards -- Tomas Vondra
On Wed, Dec 25, 2024 at 12:26 PM Tomas Vondra <tomas@vondra.me> wrote: > > Hi, > > On 12/23/24 07:35, wenhui qiu wrote: > > Hi Tomas > > This is a great feature. > > + /* > > + * Define (or redefine) custom GUC variables. > > + */ > > + DefineCustomIntVariable("stats_history.size", > > + "Sets the amount of memory available for past events.", > > + NULL, > > + &statsHistorySizeMB, > > + 1, > > + 1, > > + 128, > > + PGC_POSTMASTER, > > + GUC_UNIT_MB, > > + NULL, > > + NULL, > > + NULL); > > + > > RAM is in terabytes now, the statsHistorySize is 128MB ,I think can > > increase to store more history record ? > > > > Maybe, the 128MB is an arbitrary (and conservative) limit - it's enough > for ~500k events, which seems good enough for most systems. Of course, > on systems with many relations might need more space, not sure. > > I was thinking about specifying the space in more natural terms, either > as amount of time ("keep 1 day of history") or number of entries ("10k > entries"). That would probably mean the memory can't be allocated as > fixed size. > Based on the above, a rough calculation is that this is enough for holding 1 year of hourly vacuum runs for 50 tables, or a year of daily vacuums for 1000 tables. Most folks will fall somewhere in that range (and won't really need a year's history) but that seems like plenty for a default. > But maybe it'd be possible to just write the entries to a file. We don't > need random access to past entries (unlike e.g. pg_stat_statements), and > people won't query that very often either. > Yeah, workloads will vary, but it doesn't seem like they would more than query workloads do. Robert Treat https://xzilla.net
Hi
As far as I know, more than 10,000 tables of instances are often encountered,
So I insist that the maximum can be appropriately increased to 256MB,Can be more adaptable to many table situations
On Thu, 26 Dec 2024 at 23:23, Robert Treat <rob@xzilla.net> wrote:
On Wed, Dec 25, 2024 at 12:26 PM Tomas Vondra <tomas@vondra.me> wrote:
>
> Hi,
>
> On 12/23/24 07:35, wenhui qiu wrote:
> > Hi Tomas
> > This is a great feature.
> > + /*
> > + * Define (or redefine) custom GUC variables.
> > + */
> > + DefineCustomIntVariable("stats_history.size",
> > + "Sets the amount of memory available for past events.",
> > + NULL,
> > + &statsHistorySizeMB,
> > + 1,
> > + 1,
> > + 128,
> > + PGC_POSTMASTER,
> > + GUC_UNIT_MB,
> > + NULL,
> > + NULL,
> > + NULL);
> > +
> > RAM is in terabytes now, the statsHistorySize is 128MB ,I think can
> > increase to store more history record ?
> >
>
> Maybe, the 128MB is an arbitrary (and conservative) limit - it's enough
> for ~500k events, which seems good enough for most systems. Of course,
> on systems with many relations might need more space, not sure.
>
> I was thinking about specifying the space in more natural terms, either
> as amount of time ("keep 1 day of history") or number of entries ("10k
> entries"). That would probably mean the memory can't be allocated as
> fixed size.
>
Based on the above, a rough calculation is that this is enough for
holding 1 year of hourly vacuum runs for 50 tables, or a year of daily
vacuums for 1000 tables. Most folks will fall somewhere in that range
(and won't really need a year's history) but that seems like plenty
for a default.
> But maybe it'd be possible to just write the entries to a file. We don't
> need random access to past entries (unlike e.g. pg_stat_statements), and
> people won't query that very often either.
>
Yeah, workloads will vary, but it doesn't seem like they would more
than query workloads do.
Robert Treat
https://xzilla.net
On 12/26/24 17:00, wenhui qiu wrote: > Hi > > As far as I know, more than 10,000 tables of instances are often > encountered, > So I insist that the maximum can be appropriately increased to 256MB, > Can be more adaptable to many table situations > If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not make a fundamental difference ... Anyway, the 128MB value is rather arbitrary. I don't mind increasing the limit, or possibly removing it entirely (and accepting anything the system can handle). regards -- Tomas Vondra
Hi Tomas
> If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not
> make a fundamental difference ...
agree
> Anyway, the 128MB value is rather arbitrary. I don't mind increasing the
> limit, or possibly removing it entirely (and accepting anything the
> system can handle).
> make a fundamental difference ...
agree
> Anyway, the 128MB value is rather arbitrary. I don't mind increasing the
> limit, or possibly removing it entirely (and accepting anything the
> system can handle).
yes, I mean by this is that the maximum value is not friendly to large instances if the setting is small ,In the real production instance , many sub-tables with the same table structure are often encountered
On Fri, Dec 27, 2024 at 1:58 AM Tomas Vondra <tomas@vondra.me> wrote:
On 12/26/24 17:00, wenhui qiu wrote:
> Hi
>
> As far as I know, more than 10,000 tables of instances are often
> encountered,
> So I insist that the maximum can be appropriately increased to 256MB,
> Can be more adaptable to many table situations
>
If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not
make a fundamental difference ...
Anyway, the 128MB value is rather arbitrary. I don't mind increasing the
limit, or possibly removing it entirely (and accepting anything the
system can handle).
regards
--
Tomas Vondra
On 12/27/24 05:00, Michael Paquier wrote: > On Thu, Dec 26, 2024 at 06:58:11PM +0100, Tomas Vondra wrote: >> If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not >> make a fundamental difference ... >> >> Anyway, the 128MB value is rather arbitrary. I don't mind increasing the >> limit, or possibly removing it entirely (and accepting anything the >> system can handle). > > + DefineCustomIntVariable("stats_history.size", > + "Sets the amount of memory available for past events.", > > How about some time-based retention? Data size can be hard to think > about for the end user, while there would be a set of users that would > want to retain data for the past week, month, etc? If both size and > time upper-bound are define, then entries that match one condition or > the other are removed. > Right. In my response [1] I suggested replacing the simple memory limit with a time-based limit, but I haven't done anything about that yet. And the more I think about it the more I'm convinced we don't need to keep the data about past runs in memory, a file should be enough (except maybe for a small buffer). That would mean we don't need to worry about dynamic shared memory, etc. I initially rejected this because it seemed like a regression to how pgstat worked initially (sharing data through files), but I don't think that's actually true - this data is different (almost append-only), etc. The one case when we may need co read the data is in response to DROP of a table, when we need to discard entries for that object. Or we could handle that by recording OIDs of dropped objects ... not sure how complex this would need to be. We'd still want to enforce some limits, of course - a time-based limit of the minimum amount of time to cover, and maximum amount of disk space to use (more as a safety). FWIW there's one "issue" with enforcing the time-based limit - we can only do that for the "end time", because that's when we see the entry. If you configure the limit to keep "1 day" history, and then a vacuum completes after running for 2 days, we want to keep it, so that anyone can actually see it. [1] https://www.postgresql.org/message-id/8df7cee1-31aa-4db3-bbb7-83157ca139da%40vondra.me > + checkpoint_log_hook( > + CheckpointStats.ckpt_start_t, /* start_time */ > + CheckpointStats.ckpt_end_t, /* end_time */ > + (flags & CHECKPOINT_IS_SHUTDOWN), /* is_shutdown */ > + (flags & CHECKPOINT_END_OF_RECOVERY), /* is_end_of_recovery */ > + (flags & CHECKPOINT_IMMEDIATE), /* is_immediate */ > + (flags & CHECKPOINT_FORCE), /* is_force */ > + (flags & CHECKPOINT_WAIT), /* is_wait */ > + (flags & CHECKPOINT_CAUSE_XLOG), /* is_wal */ > + (flags & CHECKPOINT_CAUSE_TIME), /* is_time */ > + (flags & CHECKPOINT_FLUSH_ALL), /* is_flush_all */ > + CheckpointStats.ckpt_bufs_written, /* buffers_written */ > + CheckpointStats.ckpt_slru_written, /* slru_written */ > + CheckpointStats.ckpt_segs_added, /* segs_added */ > + CheckpointStats.ckpt_segs_removed, /* segs_removed */ > + CheckpointStats.ckpt_segs_recycled, /* segs_recycled */ > > That's a lot of arguments. CheckpointStatsData and the various > CHECKPOINT_* flags are exposed, why not just send these values to the > hook? > > For v1-0001 as well, I'd suggest some grouping with existing > structures, or expose these structures so as they can be reused for > out-of-core code via the proposed hook. More arguments lead to more > mistakes that could be easily avoided. Yes, I admit the number of parameters seemed a bit annoying to me too, and maybe we could reduce that somewhat. Certainly for checkpoints, where we already have a reasonable CheckpointStats struct and flags, wrapping most of the fields. With vacuum it's a bit more complicated, for two reasons: (a) the counters are simply in LVRelState, mixed with all kinds of other info, and it seems "not great" to pass it to a "log" hook, and (b) there are some calculated values, so I guess the hook would need to do that calculation on it's own, and it'd be easy to diverge over time. For (a) we could introduce some "stats" struct to keep these counters for vacuum (the nearby parallel vacuum patch actually does something like that, I think). Not sure if (b) is actually a problem in practice, but I guess we could add those fields to the new "stats" struct too. regards -- Tomas Vondra
On Fri, Dec 27, 2024 at 8:25 PM Tomas Vondra <tomas@vondra.me> wrote: > On 12/27/24 05:00, Michael Paquier wrote: > > On Thu, Dec 26, 2024 at 06:58:11PM +0100, Tomas Vondra wrote: > >> If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not > >> make a fundamental difference ... > >> > >> Anyway, the 128MB value is rather arbitrary. I don't mind increasing the > >> limit, or possibly removing it entirely (and accepting anything the > >> system can handle). > > > > + DefineCustomIntVariable("stats_history.size", > > + "Sets the amount of memory available for past events.", > > > > How about some time-based retention? Data size can be hard to think > > about for the end user, while there would be a set of users that would > > want to retain data for the past week, month, etc? If both size and > > time upper-bound are define, then entries that match one condition or > > the other are removed. > > > > Right. In my response [1] I suggested replacing the simple memory limit > with a time-based limit, but I haven't done anything about that yet. > > And the more I think about it the more I'm convinced we don't need to > keep the data about past runs in memory, a file should be enough (except > maybe for a small buffer). That would mean we don't need to worry about > dynamic shared memory, etc. I initially rejected this because it seemed > like a regression to how pgstat worked initially (sharing data through > files), but I don't think that's actually true - this data is different > (almost append-only), etc. > > The one case when we may need co read the data is in response to DROP of > a table, when we need to discard entries for that object. Or we could > handle that by recording OIDs of dropped objects ... not sure how > complex this would need to be. > > We'd still want to enforce some limits, of course - a time-based limit > of the minimum amount of time to cover, and maximum amount of disk space > to use (more as a safety). > > FWIW there's one "issue" with enforcing the time-based limit - we can > only do that for the "end time", because that's when we see the entry. > If you configure the limit to keep "1 day" history, and then a vacuum > completes after running for 2 days, we want to keep it, so that anyone > can actually see it. > I can't say I recall all the reasoning involved in making pg_stat_statements just be based on a fixed number of entries, but the ability to come up with corner cases was certainly a factor. For example, imagine the scenario where you set a max at 30 days, but you have some tables only being vacuumed every few months. Ideally you probably want the last entry no matter what, and honestly probably the last 2 (in case you are troubleshooting something, having the last run and something to compare against is ideal). In any case, it can get complicated pretty quickly. > [1] > https://www.postgresql.org/message-id/8df7cee1-31aa-4db3-bbb7-83157ca139da%40vondra.me > > > + checkpoint_log_hook( > > + CheckpointStats.ckpt_start_t, /* start_time */ > > + CheckpointStats.ckpt_end_t, /* end_time */ > > + (flags & CHECKPOINT_IS_SHUTDOWN), /* is_shutdown */ > > + (flags & CHECKPOINT_END_OF_RECOVERY), /* is_end_of_recovery */ > > + (flags & CHECKPOINT_IMMEDIATE), /* is_immediate */ > > + (flags & CHECKPOINT_FORCE), /* is_force */ > > + (flags & CHECKPOINT_WAIT), /* is_wait */ > > + (flags & CHECKPOINT_CAUSE_XLOG), /* is_wal */ > > + (flags & CHECKPOINT_CAUSE_TIME), /* is_time */ > > + (flags & CHECKPOINT_FLUSH_ALL), /* is_flush_all */ > > + CheckpointStats.ckpt_bufs_written, /* buffers_written */ > > + CheckpointStats.ckpt_slru_written, /* slru_written */ > > + CheckpointStats.ckpt_segs_added, /* segs_added */ > > + CheckpointStats.ckpt_segs_removed, /* segs_removed */ > > + CheckpointStats.ckpt_segs_recycled, /* segs_recycled */ > > > > That's a lot of arguments. CheckpointStatsData and the various > > CHECKPOINT_* flags are exposed, why not just send these values to the > > hook? > > > > For v1-0001 as well, I'd suggest some grouping with existing > > structures, or expose these structures so as they can be reused for > > out-of-core code via the proposed hook. More arguments lead to more > > mistakes that could be easily avoided. > > Yes, I admit the number of parameters seemed a bit annoying to me too, > and maybe we could reduce that somewhat. Certainly for checkpoints, > where we already have a reasonable CheckpointStats struct and flags, > wrapping most of the fields. > > With vacuum it's a bit more complicated, for two reasons: (a) the > counters are simply in LVRelState, mixed with all kinds of other info, > and it seems "not great" to pass it to a "log" hook, and (b) there are > some calculated values, so I guess the hook would need to do that > calculation on it's own, and it'd be easy to diverge over time. > > For (a) we could introduce some "stats" struct to keep these counters > for vacuum (the nearby parallel vacuum patch actually does something > like that, I think). Not sure if (b) is actually a problem in practice, > but I guess we could add those fields to the new "stats" struct too. > At the risk of increasing scope, since you already are working on checkpoints along with vacuums, I'm curious if there was a reason not to do analyze stats retention as well? It seems pretty correlated in the same area/problems as vacuum history. Robert Treat https://xzilla.net
On 12/29/24 16:39, Robert Treat wrote: > On Fri, Dec 27, 2024 at 8:25 PM Tomas Vondra <tomas@vondra.me> wrote: >> On 12/27/24 05:00, Michael Paquier wrote: >>> On Thu, Dec 26, 2024 at 06:58:11PM +0100, Tomas Vondra wrote: >>>> If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not >>>> make a fundamental difference ... >>>> >>>> Anyway, the 128MB value is rather arbitrary. I don't mind increasing the >>>> limit, or possibly removing it entirely (and accepting anything the >>>> system can handle). >>> >>> + DefineCustomIntVariable("stats_history.size", >>> + "Sets the amount of memory available for past events.", >>> >>> How about some time-based retention? Data size can be hard to think >>> about for the end user, while there would be a set of users that would >>> want to retain data for the past week, month, etc? If both size and >>> time upper-bound are define, then entries that match one condition or >>> the other are removed. >>> >> >> Right. In my response [1] I suggested replacing the simple memory limit >> with a time-based limit, but I haven't done anything about that yet. >> >> And the more I think about it the more I'm convinced we don't need to >> keep the data about past runs in memory, a file should be enough (except >> maybe for a small buffer). That would mean we don't need to worry about >> dynamic shared memory, etc. I initially rejected this because it seemed >> like a regression to how pgstat worked initially (sharing data through >> files), but I don't think that's actually true - this data is different >> (almost append-only), etc. >> >> The one case when we may need co read the data is in response to DROP of >> a table, when we need to discard entries for that object. Or we could >> handle that by recording OIDs of dropped objects ... not sure how >> complex this would need to be. >> >> We'd still want to enforce some limits, of course - a time-based limit >> of the minimum amount of time to cover, and maximum amount of disk space >> to use (more as a safety). >> >> FWIW there's one "issue" with enforcing the time-based limit - we can >> only do that for the "end time", because that's when we see the entry. >> If you configure the limit to keep "1 day" history, and then a vacuum >> completes after running for 2 days, we want to keep it, so that anyone >> can actually see it. >> > > I can't say I recall all the reasoning involved in making > pg_stat_statements just be based on a fixed number of entries, but the > ability to come up with corner cases was certainly a factor. For > example, imagine the scenario where you set a max at 30 days, but you > have some tables only being vacuumed every few months. Ideally you > probably want the last entry no matter what, and honestly probably the > last 2 (in case you are troubleshooting something, having the last run > and something to compare against is ideal). In any case, it can get > complicated pretty quickly. > I really don't want to get into such complicated retention policies, because there's no "right" solution, and it adds complexity both to implementation (e.g. we'd need to keep per-relation counters for all the various events), and processing (How would you know if there were no other vacuum runs on a relation or if those other events were removed?) I think the best solution to use a trivial retention policy (e.g. keep everything for X days), and if you want to keep a longer history, you need to read and archive that regularly (and the retention period ensures you don't miss any events). After all, that's what we assume for most other runtime stats anyway - it's not very useful to know the current values, if you can't calculate deltas. So you have to sample the stats. There's also the trouble that this is not crash safe, so I'd hesitate to rely on this very much if it can disappear. > ... > > At the risk of increasing scope, since you already are working on > checkpoints along with vacuums, I'm curious if there was a reason not > to do analyze stats retention as well? It seems pretty correlated in > the same area/problems as vacuum history. > I agree tracking ANALYZE would be useful. I ignored it mostly to keep the scope as limited as possible, so two separate events were enough. Adding another hook to do_analyze_rel() would be fairly trivial, but I'll leave that for later. regards -- Tomas Vondra
On Dec 25, 2024, at 11:25 AM, Tomas Vondra <tomas@vondra.me> wrote:
But maybe it'd be possible to just write the entries to a file. We don'tneed random access to past entries (unlike e.g. pg_stat_statements), and
people won't query that very often either.
Assuming this doesn’t add significant complexity I think file-based would be the best way to go. It allows for more meaningful retention policies (ie, “1 month”) that don’t equate to storage size. More importantly, it removes the need to try and scale these limits based on hardware. While 128MB isn’t a huge amount on modern hardware, it’s also not nothing (especially if it can’t be swapped). This is especially true in an environment with a lot of small database instances (ie, at work our default instance has 8GB of memory, and it’d be nice to reduce that even more).
Speaking of retention, it would be nice if this feature allowed users to DELETE from the view that presented the data. That would allow for any kind of custom config that someone could dream up.
On 12/31/24 02:06, Michael Paquier wrote: > On Sat, Dec 28, 2024 at 02:25:16AM +0100, Tomas Vondra wrote: >> And the more I think about it the more I'm convinced we don't need to >> keep the data about past runs in memory, a file should be enough (except >> maybe for a small buffer). That would mean we don't need to worry about >> dynamic shared memory, etc. I initially rejected this because it seemed >> like a regression to how pgstat worked initially (sharing data through >> files), but I don't think that's actually true - this data is different >> (almost append-only), etc. > > Right, I was looking a bit at 0003 that introduces the extension. I > am wondering if we are not repeating the errors of pgss by using a > different file, and if we should not just use pgstats and its single > file instead to store this data through an extension. You are right > that as an append-only pattern using the dshash of pgstat does not fit > well into this picture. How about the second type of stats kinds: > the fixed-numbered stats kind? These allocate a fixed amount of > shared memory, meaning that you could allocate N entries of history > and just manage a queue of them, then do a memcpy() of the whole set > if adding new history at the head of the queue, or just append new > ones at the tail of the queue in shmem, memcpy() once the queue is > full. The extension gets simpler: > - No need to manage a new file, flush of the stats is controlled by > pgstats itself. > - The extension could register a fixed-numbered custom stats kind. I'm not against leveraging some of the existing pstat infrastructure, but as I explained earlier I don't like the "fixed amount of shmem" approach. It's either wasteful (on machines with few vacuum runs) or difficult to configure to keep enough history. FWIW I'm not sure what you mean by "pgss errors" - I'm not saying it's flawless, but OTOH reading/writing a flat file with entries is not particularly hard. (It'd need to be a bit more complex to evict stuff for dropped objects, but not by much). > - Push the stats with the new hooks. > - Perhaps less useful, but it is possible to control the timing where > the data is pushed. Not sure what you mean by "pushed". Pushed where, or how would that matter? > - Add SQL wrappers on top to fetch the data from pgstat. OK, although I think this is not particularly complicated part of the code. It could even be simplified further. regards -- Tomas Vondra
On 12/30/24 22:40, Jim Nasby wrote: > On Dec 25, 2024, at 11:25 AM, Tomas Vondra <tomas@vondra.me> wrote: >> But maybe it'd be possible to just write the entries to a file. We don't >> need random access to past entries (unlike e.g. pg_stat_statements), and >> people won't query that very often either. > > Assuming this doesn’t add significant complexity I think file-based > would be the best way to go. It allows for more meaningful retention > policies (ie, “1 month”) that don’t equate to storage size. More > importantly, it removes the need to try and scale these limits based on > hardware. While 128MB isn’t a huge amount on modern hardware, it’s also > not nothing (especially if it can’t be swapped). This is especially true > in an environment with a lot of small database instances (ie, at work > our default instance has 8GB of memory, and it’d be nice to reduce that > even more). > True, although I'd point out the 128MB was meant to be a maximum for a GUC, not the default. The default in the patch was 1MB, and with a lots of instances on the same machine, those can't be very active so this would probably be enough. Anyway, with the simple file-based solution this would go away. > Speaking of retention, it would be nice if this feature allowed users to > DELETE from the view that presented the data. That would allow for any > kind of custom config that someone could dream up. I really don't intend / want to do that. That significantly increases the complexity, turning an append-only file into something that's essentially almost a regular table. Yes, even with the append-only file there's need to be a way to deal with dropped objects, but supporting arbitrary deletes seems much more demanding. There should be a way to reset the stats, of course. But only a fairly coarse one. regards -- Tomas Vondra
On Dec 31, 2024, at 9:20 AM, Tomas Vondra <tomas@vondra.me> wrote:
Speaking of retention, it would be nice if this feature allowed users to
DELETE from the view that presented the data. That would allow for any
kind of custom config that someone could dream up.
I really don't intend / want to do that. That significantly increases
the complexity, turning an append-only file into something that's
essentially almost a regular table. Yes, even with the append-only file
there's need to be a way to deal with dropped objects, but supporting
arbitrary deletes seems much more demanding.
There should be a way to reset the stats, of course. But only a fairly
coarse one.
My thought was that you’d just accumulate the pkey of everything being deleted until end of transaction and then use that to rewrite the table. On the surface that doesn’t seem too terrible, especially depending on how dropped objects as well as aging out old data is handled.
Either way I just wanted to put the idea out there as something I think would be useful, especially since it might influence parts of the simple design. I certainly agree it’s not a mandatory feature.
On 31/12/2024 16:06, Tomas Vondra wrote: > On 12/31/24 02:06, Michael Paquier wrote: >> On Sat, Dec 28, 2024 at 02:25:16AM +0100, Tomas Vondra wrote: >>> And the more I think about it the more I'm convinced we don't need to >>> keep the data about past runs in memory, a file should be enough (except >>> maybe for a small buffer). That would mean we don't need to worry about >>> dynamic shared memory, etc. I initially rejected this because it seemed >>> like a regression to how pgstat worked initially (sharing data through >>> files), but I don't think that's actually true - this data is different >>> (almost append-only), etc. >> Right, I was looking a bit at 0003 that introduces the extension. I >> am wondering if we are not repeating the errors of pgss by using a >> different file, and if we should not just use pgstats and its single >> file instead to store this data through an extension. You are right >> that as an append-only pattern using the dshash of pgstat does not fit >> well into this picture. How about the second type of stats kinds: >> the fixed-numbered stats kind? These allocate a fixed amount of >> shared memory, meaning that you could allocate N entries of history >> and just manage a queue of them, then do a memcpy() of the whole set >> if adding new history at the head of the queue, or just append new >> ones at the tail of the queue in shmem, memcpy() once the queue is >> full. The extension gets simpler: >> - No need to manage a new file, flush of the stats is controlled by >> pgstats itself. >> - The extension could register a fixed-numbered custom stats kind. > > I'm not against leveraging some of the existing pstat infrastructure, > but as I explained earlier I don't like the "fixed amount of shmem" > approach. It's either wasteful (on machines with few vacuum runs) or > difficult to configure to keep enough history. A bit late on this topic, I wrote StatsMgr which run stats snapshots based on interval, it covers only fixed stats at the moment though, it's ring buffer too, not perfect but does the job. It has many similarities and I though it'll be interesting to share what I was looking at to improve further. I think it'll be interesting to have 2 hooks in PgStat_KindInfo functions call back: * "flush_fixed_cb()": a hook inside to be able to execute other action while flushing, like generating aggregates ("counter summary") * a new member to do the snapshot directly (or whatever action relevant when the stats are in an interesting state), maybe snapshot_fixed_cb(), with a hook inside. This way, with 2 hooks we can cover both usages, for all stats (fixed here, but probably the same hooks for variable stats), and have actions on events instead of only on time interval. I don't have strict opinion about files management but having a facility from PostgreSQL to store *metrics* (as opposed to "stats" which are required for PostgreSQL) will be very convenient. Maybe like SLRU or similar (the thing used to keep commits timestamp) ? I didn't checked all already available options in this area. [1] https://codeberg.org/Data-Bene/StatsMgr --- Cédric Villemain +33 6 20 30 22 52 https://www.Data-Bene.io PostgreSQL Support, Expertise, Training, R&D
Hi, On Wed, Dec 25, 2024 at 06:25:50PM +0100, Tomas Vondra wrote: > Hi, > > On 12/23/24 07:35, wenhui qiu wrote: > > Hi Tomas > > This is a great feature. > > + /* > > + * Define (or redefine) custom GUC variables. > > + */ > > + DefineCustomIntVariable("stats_history.size", > > + "Sets the amount of memory available for past events.", > > + NULL, > > + &statsHistorySizeMB, > > + 1, > > + 1, > > + 128, > > + PGC_POSTMASTER, > > + GUC_UNIT_MB, > > + NULL, > > + NULL, > > + NULL); > > + > > RAM is in terabytes now, the statsHistorySize is 128MB ,I think can > > increase to store more history record ? > > > > Maybe, the 128MB is an arbitrary (and conservative) limit - it's enough > for ~500k events, which seems good enough for most systems. Of course, > on systems with many relations might need more space, not sure. > > I was thinking about specifying the space in more natural terms, either > as amount of time ("keep 1 day of history") or number of entries ("10k > entries"). That would probably mean the memory can't be allocated as > fixed size. > > But maybe it'd be possible to just write the entries to a file. We don't > need random access to past entries (unlike e.g. pg_stat_statements), and > people won't query that very often either. Thanks for working on this! Another idea regarding the storage of those metrics: I think that one would want to see "precise" data for recent metrics but would probably be fine with some level of aggregation for historical ones. Something like being able to retrieve "1 day of raw data" and say one year of data aggregated by day (average, maximum, minimum , standard deviation and maybe some percentiles) could be fine too. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Jan 7, 2025 at 10:44 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Wed, Dec 25, 2024 at 06:25:50PM +0100, Tomas Vondra wrote: > > Hi, > > > > On 12/23/24 07:35, wenhui qiu wrote: > > > Hi Tomas > > > This is a great feature. > > > + /* > > > + * Define (or redefine) custom GUC variables. > > > + */ > > > + DefineCustomIntVariable("stats_history.size", > > > + "Sets the amount of memory available for past events.", > > > + NULL, > > > + &statsHistorySizeMB, > > > + 1, > > > + 1, > > > + 128, > > > + PGC_POSTMASTER, > > > + GUC_UNIT_MB, > > > + NULL, > > > + NULL, > > > + NULL); > > > + > > > RAM is in terabytes now, the statsHistorySize is 128MB ,I think can > > > increase to store more history record ? > > > > > > > Maybe, the 128MB is an arbitrary (and conservative) limit - it's enough > > for ~500k events, which seems good enough for most systems. Of course, > > on systems with many relations might need more space, not sure. > > > > I was thinking about specifying the space in more natural terms, either > > as amount of time ("keep 1 day of history") or number of entries ("10k > > entries"). That would probably mean the memory can't be allocated as > > fixed size. > > > > But maybe it'd be possible to just write the entries to a file. We don't > > need random access to past entries (unlike e.g. pg_stat_statements), and > > people won't query that very often either. > > Thanks for working on this! > > Another idea regarding the storage of those metrics: I think that one would > want to see "precise" data for recent metrics but would probably be fine with some > level of aggregation for historical ones. Something like being able to retrieve > "1 day of raw data" and say one year of data aggregated by day (average, maximum, > minimum , standard deviation and maybe some percentiles) could be fine too. > While I'm sure some people are ok with it, I would say that most of the observability/metrics community has moved away from aggregated data storage towards raw time series data in tools like prometheus, tsdb, and timescale in order to avoid the problems that misleading / lossy / low-resolution data can create. Robert Treat https://xzilla.net
On 1/7/25 21:42, Robert Treat wrote: > On Tue, Jan 7, 2025 at 10:44 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: >> >> ... >> >> Another idea regarding the storage of those metrics: I think that one would >> want to see "precise" data for recent metrics but would probably be fine with some >> level of aggregation for historical ones. Something like being able to retrieve >> "1 day of raw data" and say one year of data aggregated by day (average, maximum, >> minimum , standard deviation and maybe some percentiles) could be fine too. >> > > While I'm sure some people are ok with it, I would say that most of > the observability/metrics community has moved away from aggregated > data storage towards raw time series data in tools like prometheus, > tsdb, and timescale in order to avoid the problems that misleading / > lossy / low-resolution data can create. > That's how I see it too. My primary goal is to provide the raw data, even if it covers only a limited amount of time, so that it can be either queried directly, or ingested regularly into something like prometheus. I can imagine a more complicated system, aggregating the data after into a lower resolution (e.g. per day). But that's not a complete solution, because e.g. what if there are many relations that happen to be related only once per day? regards -- Tomas Vondra
Hi, On Tue, Jan 07, 2025 at 10:19:36PM +0100, Tomas Vondra wrote: > On 1/7/25 21:42, Robert Treat wrote: > > On Tue, Jan 7, 2025 at 10:44 AM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > >> > >> ... > >> > >> Another idea regarding the storage of those metrics: I think that one would > >> want to see "precise" data for recent metrics but would probably be fine with some > >> level of aggregation for historical ones. Something like being able to retrieve > >> "1 day of raw data" and say one year of data aggregated by day (average, maximum, > >> minimum , standard deviation and maybe some percentiles) could be fine too. > >> > > > > While I'm sure some people are ok with it, I would say that most of > > the observability/metrics community has moved away from aggregated > > data storage towards raw time series data in tools like prometheus, > > tsdb, and timescale in order to avoid the problems that misleading / > > lossy / low-resolution data can create. > > > > That's how I see it too. My primary goal is to provide the raw data, > even if it covers only a limited amount of time, so that it can be > either queried directly, or ingested regularly into something like > prometheus. > Okay makes sense. I was thinking about "old habits" that I had but agree that providing raw data is more the way to go nowadays. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com