Thread: widen vacuum buffer counters
We recently noticed that vacuum buffer counters wraparound in extreme cases, with ridiculous results. Example: 2020-01-06 16:38:38.010 EST [45625-1] app= LOG: automatic vacuum of table "somtab.sf.foobar": index scans: 17 pages: 0 removed, 207650641 remain, 0 skipped due to pins, 13419403 skipped frozen tuples: 141265419 removed, 3186614627 remain, 87783760 are dead but not yet removable buffer usage: -2022059267 hits, -17141881 misses, 1252507767 dirtied avg read rate: -0.043 MB/s, avg write rate: 3.146 MB/s system usage: CPU 107819.92s/2932957.75u sec elapsed 3110498.10 sec That's to be expected, as tables exist that are large enough for 4 billion buffer accesses to be a possibility. Let's widen the counters, as in the attached patch. I propose to backpatch this. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Fri, Jan 31, 2020 at 9:59 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > We recently noticed that vacuum buffer counters wraparound in extreme > cases, with ridiculous results. Example: > > 2020-01-06 16:38:38.010 EST [45625-1] app= LOG: automatic vacuum of table "somtab.sf.foobar": index scans: 17 > pages: 0 removed, 207650641 remain, 0 skipped due to pins, 13419403 skipped frozen > tuples: 141265419 removed, 3186614627 remain, 87783760 are dead but not yet removable > buffer usage: -2022059267 hits, -17141881 misses, 1252507767 dirtied > avg read rate: -0.043 MB/s, avg write rate: 3.146 MB/s > system usage: CPU 107819.92s/2932957.75u sec elapsed 3110498.10 sec > > That's to be expected, as tables exist that are large enough for 4 billion > buffer accesses to be a possibility. Let's widen the counters, as in the > attached patch. > > I propose to backpatch this. +1, and patch LGTM.
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > We recently noticed that vacuum buffer counters wraparound in extreme > cases, with ridiculous results. Ugh. > I propose to backpatch this. +1 for widening these counters, but since they're global variables, -0.2 or so for back-patching. I don't know of any reason that an extension would be touching these, but I feel like the problem isn't severe enough to justify taking an ABI-break risk. Also, %zd is the wrong format code for int64. Recommended practice these days is to use "%lld" with an explicit cast of the printf argument to long long (just to be sure). That doesn't work safely before v12, and if you did insist on back-patching further, you'd need to jump through hoops to avoid having platform-specific format codes in a translatable string. (The side-effects for translation seem like an independent argument against back-patching.) regards, tom lane
On Fri, Jan 31, 2020 at 05:13:53PM -0500, Tom Lane wrote: > +1 for widening these counters, but since they're global variables, -0.2 > or so for back-patching. I don't know of any reason that an extension > would be touching these, but I feel like the problem isn't severe enough > to justify taking an ABI-break risk. I would not recommend doing a back-patch because of that. I don't think that's worth taking any risk. Extension authors can have a lot of imagination. > Also, %zd is the wrong format code for int64. Recommended practice > these days is to use "%lld" with an explicit cast of the printf argument > to long long (just to be sure). That doesn't work safely before v12, > and if you did insist on back-patching further, you'd need to jump > through hoops to avoid having platform-specific format codes in a > translatable string. (The side-effects for translation seem like > an independent argument against back-patching.) Surely you meant INT64_FORMAT here? Anyway, looking at the patch, couldn't we just use uint64? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Fri, Jan 31, 2020 at 05:13:53PM -0500, Tom Lane wrote: >> Also, %zd is the wrong format code for int64. Recommended practice >> these days is to use "%lld" with an explicit cast of the printf argument >> to long long (just to be sure). That doesn't work safely before v12, >> and if you did insist on back-patching further, you'd need to jump >> through hoops to avoid having platform-specific format codes in a >> translatable string. (The side-effects for translation seem like >> an independent argument against back-patching.) > Surely you meant INT64_FORMAT here? No, because that varies depending on platform, so using it in a translatable string is a bad idea. See e.g. 6a1cd8b92. > Anyway, looking at the patch, > couldn't we just use uint64? Yeah, I was wondering if those counters shouldn't be unsigned, too. Probably doesn't matter once we widen them to 64 bits though. regards, tom lane
On 2020-Jan-31, Tom Lane wrote: > Also, %zd is the wrong format code for int64. Recommended practice > these days is to use "%lld" with an explicit cast of the printf argument > to long long (just to be sure). That doesn't work safely before v12, > and if you did insist on back-patching further, you'd need to jump > through hoops to avoid having platform-specific format codes in a > translatable string. (The side-effects for translation seem like > an independent argument against back-patching.) Pushed with that change; did not backpatch, because I don't think it's really worth the possible breakage :-) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services