Re: Assert in pageinspect with NULL pages - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Assert in pageinspect with NULL pages
Date
Msg-id CAEze2WgaJD14Z_Pd3NJ89qxgUHbU69QDhex_syH+W92VLZm2=A@mail.gmail.com
Whole thread Raw
In response to Re: Assert in pageinspect with NULL pages  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sun, 27 Mar 2022 at 16:24, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Mar 25, 2022 at 12:57 AM Michael Paquier <michael@paquier.xyz> wrote:
> > On Thu, Mar 24, 2022 at 08:54:14PM -0700, Peter Geoghegan wrote:
> > > amcheck's palloc_btree_page() function validates that an 8KiB page is
> > > in fact an nbtree page, in a maximally paranoid way. Might be an
> > > example worth following here.
> >
> > Sure, we could do that.  However, I don't think that going down to
> > that is something we have any need for in pageinspect, as the module
> > is also useful to look at the contents of the full page, even if
> > slightly corrupted, and too many checks would prevent a lookup at the
> > internal contents of a page.
>
> It's my impression that there are many ways of crashing the system
> using pageinspect right now. We aren't very careful about making sure
> that our code that reads from disk doesn't crash if the data on disk
> is corrupted, and all of that systemic weakness is inherited by
> pageinspect. But, with pageinspect, you can supply your own pages, not
> just use the ones that actually exist on disk.
>
> Fixing this seems like a lot of work and problematic for various
> reasons. And I do agree that if we can allow people to look at what's
> going on with a corrupt page, that's useful. On the other hand, if by
> "looking" they crash the system, that sucks a lot. So overall I think
> we need a lot more sanity checks here.

I noticed this thread due to recent commit 291e517a, and noticed that
it has some overlap with one of my patches [0], in which I fix the
corresponding issue in core postgres by assuming (and in
assert-enabled builds, verifying) the size and location of the special
area. As such, you might be interested in that patch.

Note that currently in core postgres a corrupted special area pointer
will likely target neighbouring blocks in the buffer pool; resulting
in spreading corruption when the special area is updated. This
spreading corruption should be limited to only the corrupted block
with my patch.

Kind regards,

Matthias van de Meent

[0] https://commitfest.postgresql.org/37/3543/
https://www.postgresql.org/message-id/CAEze2WjE3+tGO9Fs9+iZMU+z6mMZKo54W1Zt98WKqbEUHbHOBg@mail.gmail.com



pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: [WIP] Allow pg_upgrade to copy segments of the same relfilenode in parallel
Next
From: Tom Lane
Date:
Subject: Re: Why is lorikeet so unstable in v14 branch only?