Re: Count and log pages set all-frozen by vacuum - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Count and log pages set all-frozen by vacuum |
Date | |
Msg-id | CAAKRu_bWMa-wakEmFfB3h-CEJSuh2jAqTgYBRpV99cpovY4MXQ@mail.gmail.com Whole thread Raw |
In response to | Re: Count and log pages set all-frozen by vacuum (Tomas Vondra <tomas@vondra.me>) |
List | pgsql-hackers |
On Thu, Dec 12, 2024 at 9:39 PM Tomas Vondra <tomas@vondra.me> wrote: > > On 12/11/24 20:18, Masahiko Sawada wrote: > > > > ... > > > >> Here's an example to exercise the new log message: > >> > >> create table foo (a int, b int) with (autovacuum_enabled = false); > >> insert into foo select generate_series(1,1000), 1; > >> delete from foo where a > 500; > >> vacuum (verbose) foo; > >> -- visibility map: 5 pages newly set all-visible, of which 2 set > >> all-frozen. 0 all-visible pages newly set all-frozen. > >> insert into foo select generate_series(1,500), 1; > >> vacuum (verbose, freeze) foo; > >> --visibility map: 3 pages newly set all-visible, of which 3 set > >> all-frozen. 2 all-visible pages newly set all-frozen. > I agree the v4 patch is fine, although I find the wording with multiple > "newly" a bit verbose / confusing. Maybe like this would be better: > > %u pages set all-visible, %u set all-frozen (%u were all-visible) > > I don't want to drag this thread into an infinite discussion about the > best possible wording, so if others think the v4 wording is fine, I > won't object to it. I changed it to use your suggested wording. > For me the bigger questions is whether these new counters added to te > vacuum log message are actually useful in practice. I understand it may > be useful while working on a patch related to eager freezing (although I > think Melanie changed the approach of that patch series, and it doesn't > actually require this patch anymore). > > But I'm a bit skeptical about it being useful for regular users or DBAs. > I certainly don't remember me wanting to know these values per-vacuum. > Of course, maybe that's bias - knowing it's not available and thus not > asking for it. But I think it's also very hard to make conclusions about > the "freeze debt" from these per-vacuum values - we don't know how the > values combine. It might be disjunct set of pages (and then we should > just sum them), or maybe it's the same set of pages over and over again > (and then the debt doesn't really change). I agree that it won't fully tell you the freeze debt over time, but, for example, you can look at the number of scanned pages and the total number of pages and if the number of pages being set all-visible stays the same each time but the total number of pages in the relation grows (so % scanned shrinks) that likely means you are setting new pages all-visible. > It doesn't mean it's useless - e.g. we might compare the sum to a delta > of values from visibility_map_summary() and make some deductions about > how often are pages set all-visible repeatedly. And maybe vacuum should > log those before/after visibility_count values too, to make this easier. > Not sure how costly would that be ... Right, it is useful to know how many are all-visible at the start and finish of vacuum. But that is pretty different from the other vacuum log output. The rest of it tells you what the vacuum did, while this would tell you how the table has changed state during the vacuum. > To me these visibility_count seem more important when quantifying the > freeze debt for a given table. But I think that also needs to consider > how old those all-visible pages are - the older the more it contributes > to the debt, I think. And that won't be in the vacuum log, of course. Yes, to keep track of how old the all-visible pages are, you need something like a histogram or other data structure to represent the ranges of ages of all-visible pages -- then update them when a page is modified. I definitely think this would be useful. But I ran into the same space issues I mention below. You need a killer use case to justify introducing something that takes up hundreds of bytes per table (and introduces new APIs). > Another thing is that analyzing vacuum log messages is ... not very > easy. Having to parse multi-line log messages, with a mix of text and > fields (not even mentioning translations) is not fun. Of course, none of > that is a fault of this patch, and I don't expect this patch to fix > that. But it's hard to get excited about new fields added to this log > message, when it'd be most useful aggregated for vacuums over some time > interval. I really wish we had some way to collect and access these > runtime stats in a structured way. Yes, I also wish we could have a new type of stat that is over time but decaying (a ring buffer of recent happenings). I did several versions of things like this in draft patches for the eager freeze heuristic. The problem is that all of these take up space -- even if you just keep stats for the last 3 vacuums, that's dozens-hundreds of bytes per table for all tables. I don't see how, even with decaying them, we could afford to keep them in memory. So, we would probably want to have some sort of new on-disk stat type that isn't a table. - Melanie
pgsql-hackers by date: