Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY
Date
Msg-id 20220524222433.ibl6dgkc6jrriska@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY  (Greg Stark <stark@mit.edu>)
Responses Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY
List pgsql-bugs
Hi,

On 2022-05-24 17:11:12 -0400, Greg Stark wrote:
> On Tue, 24 May 2022 at 15:02, Andres Freund <andres@anarazel.de> wrote:
> >
> > Basically:
> >
> > 1) S1 builds index in phase 2
> > 2) S2 inserts tuple t1 (not in the index built in 1), since it's inserted
>    after that)
> > 3) S2 hot updates tuple t1->t2
> 
> Not that it matters but is this step even necessary?

I think it is, but there might be other recipes reproducing the problem.


> > 4) S1 sets PROC_IN_SAFE_IC, builds snapshot, starts validation scan (phase 3)
> > 5) S2 hot updates tuple t2->t3
> 
> That seems like the key observation. But I wonder if it's even the
> only flow where this could be an issue. What happens if t2 is deleted,
> can it get pruned away completely?

Yes it could, but afaics that'd be fine, because then there's no missing index
entry. And the index should only be marked valid once all older snapshots have
ended.


> > 6) Either S1 or S2 performs hot pruning, redirecting t1 to t3, this is only
> >    possible because PROC_IN_SAFE_IC caused S2's ->xmin to be ignored
> 
> Or presumably any other transaction.

Right.


> But ... does the update to t2->t3 not automatically trigger pruning anyways?

We don't prune during updates right now (but do when fetching the row to
update) - I think that's bad, but it's how it is.

When you say "automatically" - do you mean that it'd happen unconditionally,
independent of the horizon? It shouldn't...


> > 7) S2 checks t1->t3, finds that t3 is too new for the snapshot, doesn't create
> >    an index entry
> 
> Just to be clear, it would normally have created an index entry (for
> the whole HOT chain) because t2 is in the recheck snapshot and
> therefore the whole HOT chain wasn't in the initial snapshot. I'm a
> little confused here.

Hm? Why / where would we have done that? It's a HOT update, so the UPDATE
doesn't create an index entry. And the validate scan won't see the HOT chain
because t2 has been pruned away and t3 is too new.

What "recheck snapshot" are you referring to? The one passed to
validate_index()?


> > 8) corruption
> 
> Aside from amcheck I wonder if we can come up with any way for users
> to tell whether their index is affected or at risk. Like, is there a
> way to tell from catalog entries if an index was created with CIC?

Not reliably, afaik. indcheckxmin won't ever be set for a CIC index IIRC, but
it's not reliably set for a non-CIC index.

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17497: Data directory has been changed to default
Next
From: Michael Paquier
Date:
Subject: Re: BUG #17492: error MSB4126: The specified solution configuration "Release|arm64" is invalid