Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Date
Msg-id CAH2-WzkGLtffpGJoSp+cpN_q4VP9eF3-BhjZ+YxgAQa=O1niXA@mail.gmail.com
Whole thread Raw
In response to Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
List pgsql-bugs
On Fri, Jul 16, 2021 at 5:30 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Hmm, seems we should fix that. But could a prematurely recycled deleted
> page cause permanent corruption?

If scans can find a page that is wholly unrelated to the expected page
(and possibly even in the wrong high level page category), then it's
really hard to predict what might happen. This could lead to real
chaos. ginInsertCleanup() makes no attempt to perform basic validation
of its assumptions about what kind of page this is, except for some
assertions. We should have something like a "can't happen" error on
!GinPageIsList() inside ginInsertCleanup() -- if we had that already
then I might be able to reason about this problem. It wouldn't hurt to
have similar checks in other code that deals with posting trees and
entry trees, too.

ginInsertCleanup() is tolerant of all kinds of things. It's not just
the lack of page-level sanity checks. It's also the basic approach to
crash safety, which relies on the fact that GIN only does lossy index
scans. My guess is that there could be lots of problems without it
being obvious to users. Things really went downhill in
ginInsertCleanup() starting in commit e956808328.

> On this page, the DATA flag is set, so it is an internal *posting* tree
> page.
>
> That's weird: the scan walked straight from an internal entry tree page
> (root, at blk 1) into an internal posting tree page (blk 1452). That
> doesn't make sense to me.

I agree that the internal entry tree page (root, at blk 1) looks sane,
from what I've seen. The tuple sizes are plausible -- 16 byte index
tuples aren't possible on an entry tree leaf page. Nor in a pending
list page.

Anyway, this is roughly the kind of bug I had in mind. It's possible
that the underlying problem doesn't actually involve
ginInsertCleanup() -- as I said we have seen similar issues elsewhere
(one such issue was fixed in commit 52ac6cd2d0). But as Alexander
pointed out, that doesn't mean much. It's possible that this problem
is 1 or 2 problems removed from the original problem, which really did
start in ginInsertCleanup() -- who knows? Why shouldn't corruption
lead to more corruption, given that we don't do much basic page level
validation? We do at least sanitize within ginStepRight(), but we need
to be more consistent about it.

> The next ReadBuffer call is this:
>
> > 2021-07-16 07:01:19 UTC LOG:  ReadBuffer 1663/16390/16526 read gin blk 15559 (ginbtree.c:183 ginStepRight)
>
> Where did block 15559 come from? How come we're stepping right to it?
> It's not the right sibling of the previously accessed page, 1452. In
> fact, 15559 is a leaf posting tree page. I don't understand how that
> sequence of page reads could happen.

Maybe take a look at Block 1452 using pg_hexedit? pg_hexedit is
designed to do well at interpreting quasi-corrupt data (or at least
allowing the user to do so). We see from your pg_filedump output that
the tuple contents for the page are totally wild. We should not trust
the reported right sibling page, given everything else -- is that
really what Postgres thinks the right sibling is? I mean, clearly it
doesn't.

I think it's possible that pg_filedump is interpreting it in a way
that is kind of wrong. If you saw the same page (1452) in pg_hexedit
you might spot a pattern that pg_filedump output will never reveal. At
least looking at the raw bytes might give you some idea.

-- 
Peter Geoghegan



pgsql-bugs by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Next
From: "leiyanliang@highgo.com"
Date:
Subject: Re: BUG #17077: about three parameters in postgresql 13.3