Re: PoC: history of recent vacuum/checkpoint runs (using new hooks) - Mailing list pgsql-hackers
From | Robert Treat |
---|---|
Subject | Re: PoC: history of recent vacuum/checkpoint runs (using new hooks) |
Date | |
Msg-id | CAJSLCQ1dAmjuj=kV3yXvy3QDWpYpy+TDGgr3LGpLSh96HwYkFA@mail.gmail.com Whole thread Raw |
In response to | Re: PoC: history of recent vacuum/checkpoint runs (using new hooks) (Tomas Vondra <tomas@vondra.me>) |
Responses |
Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)
|
List | pgsql-hackers |
On Fri, Dec 27, 2024 at 8:25 PM Tomas Vondra <tomas@vondra.me> wrote: > On 12/27/24 05:00, Michael Paquier wrote: > > On Thu, Dec 26, 2024 at 06:58:11PM +0100, Tomas Vondra wrote: > >> If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not > >> make a fundamental difference ... > >> > >> Anyway, the 128MB value is rather arbitrary. I don't mind increasing the > >> limit, or possibly removing it entirely (and accepting anything the > >> system can handle). > > > > + DefineCustomIntVariable("stats_history.size", > > + "Sets the amount of memory available for past events.", > > > > How about some time-based retention? Data size can be hard to think > > about for the end user, while there would be a set of users that would > > want to retain data for the past week, month, etc? If both size and > > time upper-bound are define, then entries that match one condition or > > the other are removed. > > > > Right. In my response [1] I suggested replacing the simple memory limit > with a time-based limit, but I haven't done anything about that yet. > > And the more I think about it the more I'm convinced we don't need to > keep the data about past runs in memory, a file should be enough (except > maybe for a small buffer). That would mean we don't need to worry about > dynamic shared memory, etc. I initially rejected this because it seemed > like a regression to how pgstat worked initially (sharing data through > files), but I don't think that's actually true - this data is different > (almost append-only), etc. > > The one case when we may need co read the data is in response to DROP of > a table, when we need to discard entries for that object. Or we could > handle that by recording OIDs of dropped objects ... not sure how > complex this would need to be. > > We'd still want to enforce some limits, of course - a time-based limit > of the minimum amount of time to cover, and maximum amount of disk space > to use (more as a safety). > > FWIW there's one "issue" with enforcing the time-based limit - we can > only do that for the "end time", because that's when we see the entry. > If you configure the limit to keep "1 day" history, and then a vacuum > completes after running for 2 days, we want to keep it, so that anyone > can actually see it. > I can't say I recall all the reasoning involved in making pg_stat_statements just be based on a fixed number of entries, but the ability to come up with corner cases was certainly a factor. For example, imagine the scenario where you set a max at 30 days, but you have some tables only being vacuumed every few months. Ideally you probably want the last entry no matter what, and honestly probably the last 2 (in case you are troubleshooting something, having the last run and something to compare against is ideal). In any case, it can get complicated pretty quickly. > [1] > https://www.postgresql.org/message-id/8df7cee1-31aa-4db3-bbb7-83157ca139da%40vondra.me > > > + checkpoint_log_hook( > > + CheckpointStats.ckpt_start_t, /* start_time */ > > + CheckpointStats.ckpt_end_t, /* end_time */ > > + (flags & CHECKPOINT_IS_SHUTDOWN), /* is_shutdown */ > > + (flags & CHECKPOINT_END_OF_RECOVERY), /* is_end_of_recovery */ > > + (flags & CHECKPOINT_IMMEDIATE), /* is_immediate */ > > + (flags & CHECKPOINT_FORCE), /* is_force */ > > + (flags & CHECKPOINT_WAIT), /* is_wait */ > > + (flags & CHECKPOINT_CAUSE_XLOG), /* is_wal */ > > + (flags & CHECKPOINT_CAUSE_TIME), /* is_time */ > > + (flags & CHECKPOINT_FLUSH_ALL), /* is_flush_all */ > > + CheckpointStats.ckpt_bufs_written, /* buffers_written */ > > + CheckpointStats.ckpt_slru_written, /* slru_written */ > > + CheckpointStats.ckpt_segs_added, /* segs_added */ > > + CheckpointStats.ckpt_segs_removed, /* segs_removed */ > > + CheckpointStats.ckpt_segs_recycled, /* segs_recycled */ > > > > That's a lot of arguments. CheckpointStatsData and the various > > CHECKPOINT_* flags are exposed, why not just send these values to the > > hook? > > > > For v1-0001 as well, I'd suggest some grouping with existing > > structures, or expose these structures so as they can be reused for > > out-of-core code via the proposed hook. More arguments lead to more > > mistakes that could be easily avoided. > > Yes, I admit the number of parameters seemed a bit annoying to me too, > and maybe we could reduce that somewhat. Certainly for checkpoints, > where we already have a reasonable CheckpointStats struct and flags, > wrapping most of the fields. > > With vacuum it's a bit more complicated, for two reasons: (a) the > counters are simply in LVRelState, mixed with all kinds of other info, > and it seems "not great" to pass it to a "log" hook, and (b) there are > some calculated values, so I guess the hook would need to do that > calculation on it's own, and it'd be easy to diverge over time. > > For (a) we could introduce some "stats" struct to keep these counters > for vacuum (the nearby parallel vacuum patch actually does something > like that, I think). Not sure if (b) is actually a problem in practice, > but I guess we could add those fields to the new "stats" struct too. > At the risk of increasing scope, since you already are working on checkpoints along with vacuums, I'm curious if there was a reason not to do analyze stats retention as well? It seems pretty correlated in the same area/problems as vacuum history. Robert Treat https://xzilla.net
pgsql-hackers by date: