Re: Assert in pageinspect with NULL pages - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Assert in pageinspect with NULL pages
Date
Msg-id CAH2-Wzk4czia7nPJaYQPfpecnQH6L6gRrO4ZC7dj3sOFa1D-JQ@mail.gmail.com
Whole thread Raw
In response to Re: Assert in pageinspect with NULL pages  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Assert in pageinspect with NULL pages  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sun, Mar 27, 2022 at 7:24 AM Robert Haas <robertmhaas@gmail.com> wrote:
> It's my impression that there are many ways of crashing the system
> using pageinspect right now. We aren't very careful about making sure
> that our code that reads from disk doesn't crash if the data on disk
> is corrupted, and all of that systemic weakness is inherited by
> pageinspect.

It's significantly worse than the core code, I'd say (before we even
consider the fact that you can supply your own pages). At least with
index AMs, where functions like _bt_checkpage() and gistcheckpage()
provide the most basic sanity checks for any page that is read from
disk.

> Fixing this seems like a lot of work and problematic for various
> reasons. And I do agree that if we can allow people to look at what's
> going on with a corrupt page, that's useful. On the other hand, if by
> "looking" they crash the system, that sucks a lot. So overall I think
> we need a lot more sanity checks here.

I don't think it's particularly hard to do a little more. That's all
it would take to prevent practically all crashes. We're not dealing
with adversarial page images here.

Sure, it would be overkill to fully adapt something like
palloc_btree_page() for pageinspect, for the reason Michael gave. But
there are a couple of checks that it does that would avoid practically
all real world hard crashes, without any plausible downside:

* The basic _bt_checkpage() call that every nbtree page uses in the
core code (as well as in amcheck).

* The maxoffset > MaxIndexTuplesPerPage check.

You could even go a bit further, and care about individual index
tuples. My commit 93ee38eade added some basic sanity checks for index
tuples to bt_page_items(). That could go a bit further as well. In
particular, the sanity checks from amcheck's PageGetItemIdCareful()
function could be carried over. That would make it impossible for
bt_page_items() to read past the end of the page when given a page
that has an index tuple whose item pointer offset has some wildly
unreasonable value.

I'm not volunteering. Just saying that this is quite possible.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Frontend error logging style
Next
From: Daniel Gustafsson
Date:
Subject: Re: Small TAP tests cleanup for Windows and unused modules