Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY - Mailing list pgsql-bugs

Hi,

On 2022-05-24 23:38:07 +0500, Andrey Borodin wrote:
>
>
> > On 24 May 2022, at 23:15, Andres Freund <andres@anarazel.de> wrote:
> >
> > With fsync=on, it's much harder to reproduce.
> That exaplains why it's easier to reproduce on MacOS: it seem it ignores fsync.

Yea, one needs wal_sync_method=fsync_writethrough or such :/


> > On 24 May 2022, at 23:15, Andres Freund <andres@anarazel.de> wrote:
> >
> > I suspect the problem might be related to pruning done during the validation
> > scan. Once PROC_IN_SAFE_IC is set, the backend itself will not preserve tids
> > its own snapshot might need. Which will wreak havoc during the validation
> > scan.
>
> I observe that removing PROC_IN_SAFE_IC for index_validate() fixes tests.
> But why it's not a problem for index_build() scan?

I now suspect it's a problem for both, just more visible for index_validate().


> And I do not understand why it's a problem that tuple is pruned during the scan... How does this "wreak havoc"
happen?

Basically snapshots don't work anymore. If PROC_IN_SAFE_IC is set, that
backend is ignored for the horizon computation for snapshots / on-access HOT
pruning. Which means that rows that are visible to the snapshot can be pruned
away.

One might think that could be safe, after all the row is invisible to all
other backends. The problem is that the validation scan won't see *newer* rows
either, since they're not visible to the snapshot either. And if the new row
version is a HOT tuple, it won't have made an index entry on its own. Boom,
corruption.

Basically:

1) S1 builds index in phase 2
2) S2 inserts tuple t1 (not in the index built in 1), since it's inserted
   after that)
3) S2 hot updates tuple t1->t2
4) S1 sets PROC_IN_SAFE_IC, builds snapshot, starts validation scan (phase 3)
5) S2 hot updates tuple t2->t3
6) Either S1 or S2 performs hot pruning, redirecting t1 to t3, this is only
   possible because PROC_IN_SAFE_IC caused S2's ->xmin to be ignored
7) S2 checks t1->t3, finds that t3 is too new for the snapshot, doesn't create
   an index entry
8) corruption


Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17496: to_char function resets if interval exceeds 23 hours 59 minutes
Next
From: Jeff Janes
Date:
Subject: Re: BUG #17494: High demand for displacement sort