On 4/2/19 7:06 PM, Magnus Hagander wrote: > On Tue, Apr 2, 2019 at 8:47 AM Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> wrote: > > On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote: > > On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> wrote: > >> One thing which is not > >> proposed on this patch, and I am fine with it as a first draft, is > >> that we don't have any information about the broken block number and > >> the file involved. My gut tells me that we'd want a separate view, > >> like pg_stat_checksums_details with one tuple per (dboid, rel, fork, > >> blck) to be complete. But that's just for future work. > > > > That could indeed be nice. > > Actually, backpedaling on this one... pg_stat_checksums_details may > be a bad idea as we could finish with one row per broken block. If > a corruption is spreading quickly, pgstat would not be able to sustain > that amount of objects. Having pg_stat_checksums would allow us to > plugin more data easily based on the last failure state: > - last relid of failure > - last fork type of failure > - last block number of failure. > Not saying to do that now, but having that in pg_stat_database does > not seem very natural to me. And on top of that we would have an > extra row full of NULLs for shared objects in pg_stat_database if we > adopt the unique view approach... I find that rather ugly. > > > I think that tracking each and every block is of course a non-starter, as you've noticed.
I think that's less of a concern now that the stats collector process has gone and that the stats are now collected in shared memory, what do you think?
It would be less of a concern yes, but I think it still would be a concern. If you have a large amount of corruption you could quickly get to millions of rows to keep track of which would definitely be a problem in shared memory as well, wouldn't it?
But perhaps we could keep a list of "the last 100 checksum failures" or something like that?