Re: BUG #17245: Index corruption involving deduplicated entries - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: BUG #17245: Index corruption involving deduplicated entries
Date
Msg-id CAH2-Wzn3oMzc+ReTyFB6N77o3ip65CB3gNcF4NVQkvgSN+DXRg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17245: Index corruption involving deduplicated entries  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
On Thu, Oct 28, 2021 at 2:31 PM Andres Freund <andres@anarazel.de> wrote:
> This makes me wonder if the issue could be that we're loosing writes / that
> something is reading old page versions (e.g. due to filesystem bug). If both
> heap and index are vacuumed, but the index write is lost, this'd be what we
> see, right?

Right, but that just doesn't seem to fit. That was the first question I asked.

> Another way this could happen is if we got the wrong relation size for either
> index or table, and a vacuum scan doesn't scan the whole table or index.

I doubt that, since the heap blocks involved include heap block 0. On
the table/indexes actually affected by this, the indexes are riddled
with corruption. But every other table seems fine (at least as far as
anybody knows).

> I've not yet read the whole thread, but if not done, it seems like a good idea
> to use pg_waldump and grep for changes to the relevant heap / index
> pages. That might give us more information about what could have happened.

I think that there is a fairly high likelihood that that alone will be
enough to diagnose the bug.

> There were a fair bit of changes around the separation between heap and index
> vacuuming in 14. I wonder if there's potentially something broken around
> repeatedly vacuuming the heap without doing index vacuums or such.

I did ask myself that question earlier today, but quickly rejected the
idea. There is very little mechanism involved with that stuff. It's
very hard to imagine what could break. The code for this in
lazy_vacuum() is quite simple.

> It's also possible that there's something wrong in that darned path that
> handles recently-dead tuples.

That sounds much more likely to me.

-- 
Peter Geoghegan



pgsql-bugs by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: BUG #17245: Index corruption involving deduplicated entries
Next
From: Andres Freund
Date:
Subject: Re: BUG #17245: Index corruption involving deduplicated entries