Thread: pageinspect get_raw_page() option to return invalid pages
Hi, Currently there's no convenient way to get a corrupted page (be it a checksum failure, corrupted pd_upper/lower or whatnot) via pageinspect's get_raw_page(). Given that pageinspect is kind of our tool to do inspect issues on a data level that's not so great. I therefore propose that we add an *option* that bypasses shared buffers and reads the underlying data directly. And then skips over the validation. I'm not 100% sure if this should be done unconditionally, or only if the item wasn't found in cache. There's some coherency differences, obviously. We could also just make it two functions, instead of a parameter. It's unfortunately not entirely trivial to access specific blocks with pg_read_binary_file() - one needs a query that deals with block sizes, segment numbers, segment sizes etc... Greetings, Andres Freund
On Fri, May 4, 2018 at 11:41 AM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > Currently there's no convenient way to get a corrupted page (be it a > checksum failure, corrupted pd_upper/lower or whatnot) via pageinspect's > get_raw_page(). Given that pageinspect is kind of our tool to do inspect > issues on a data level that's not so great. > > I therefore propose that we add an *option* that bypasses shared buffers > and reads the underlying data directly. And then skips over the > validation. I'm not 100% sure if this should be done unconditionally, > or only if the item wasn't found in cache. There's some coherency > differences, obviously. +1. This would be a good option for amcheck, too. -- Peter Geoghegan
On 2018-05-04 11:42:35 -0700, Peter Geoghegan wrote: > On Fri, May 4, 2018 at 11:41 AM, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > > > Currently there's no convenient way to get a corrupted page (be it a > > checksum failure, corrupted pd_upper/lower or whatnot) via pageinspect's > > get_raw_page(). Given that pageinspect is kind of our tool to do inspect > > issues on a data level that's not so great. > > > > I therefore propose that we add an *option* that bypasses shared buffers > > and reads the underlying data directly. And then skips over the > > validation. I'm not 100% sure if this should be done unconditionally, > > or only if the item wasn't found in cache. There's some coherency > > differences, obviously. > > +1. This would be a good option for amcheck, too. Could you expand on that? Are you envisioning an option to ReadBufferExtended()? Because that's certainly not what I'm thinking of - it seems dangerous to populate shared buffers with an invalid page. Therefore I was more thinking to falling back to smgrread() or such. Greetings, Andres Freund
On Fri, May 4, 2018 at 11:46 AM, Andres Freund <andres@anarazel.de> wrote: > Could you expand on that? Are you envisioning an option to > ReadBufferExtended()? Because that's certainly not what I'm thinking of > - it seems dangerous to populate shared buffers with an invalid > page. Therefore I was more thinking to falling back to smgrread() or > such. I'm not envisaging anything specific just yet. It would be nice if amcheck had an option that bypassed shared_buffers, because users want that. That's all. -- Peter Geoghegan
On 2018-05-04 11:53:25 -0700, Peter Geoghegan wrote: > On Fri, May 4, 2018 at 11:46 AM, Andres Freund <andres@anarazel.de> wrote: > > Could you expand on that? Are you envisioning an option to > > ReadBufferExtended()? Because that's certainly not what I'm thinking of > > - it seems dangerous to populate shared buffers with an invalid > > page. Therefore I was more thinking to falling back to smgrread() or > > such. > > I'm not envisaging anything specific just yet. It would be nice if > amcheck had an option that bypassed shared_buffers, because users want > that. That's all. Can you expand on what they want? - Avoid polluting caches? Why's the ringbuffer logic not good enough? - Continue after a checksum or similar failure? That seems a bit useless for amcheck imo? You know there's corruption at that point after all. - Read on disk data, bypassing shared buffers? That'd present a lot of coherency issues, no? Greetings, Andres Freund
On Fri, May 4, 2018 at 11:56 AM, Andres Freund <andres@anarazel.de> wrote: > Can you expand on what they want? > > - Avoid polluting caches? Why's the ringbuffer logic not good enough? > - Continue after a checksum or similar failure? That seems a bit useless > for amcheck imo? You know there's corruption at that point after all. > - Read on disk data, bypassing shared buffers? That'd present a lot of > coherency issues, no? I think that "Read on-disk data" would be a compelling feature in certain environments. It would present some concurrency issues, but those seem solvable. The only thing that the "!readonly && !heapallindexed" checks need that complicates things is a snapshot that prevents concurrent recycling by VACUUM. If we gave up on the cross-page invariant check, then we wouldn't even need to worry about concurrent recycling by VACUUM, while still offering almost the same coverage as "!readonly && !heapallindexed" (I suppose we'd also have to give up on the level cross-check that you suggested when v1 of amcheck went in, too, but I think that that's it). Maybe it would a better use of my time to focus on making this accessible to backup tools, that should ideally work without needing to acquire any MVCC snapshot. Probably from a front-end utility. We'd need to export at least some operator class functionality to make that work, though. -- Peter Geoghegan