On Tue, Feb 18, 2020 at 07:39:49AM +0100, Julien Rouhaud wrote:
> On Tue, Feb 18, 2020 at 7:19 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Tue, Feb 18, 2020 at 07:06:25AM +0100, Julien Rouhaud wrote:
> > > On Tue, Feb 18, 2020 at 6:30 AM Michael Paquier <michael@paquier.xyz> wrote:
> > >> Hmm. There could be an argument here for skipping invalid toast
> > >> indexes within reindex_index(), because we are sure about having at
> > >> least one valid toast index at anytime, and these are not concerned
> > >> with CIC.
> > >
> > > Or even automatically drop any invalid index on toast relation in
> > > reindex_relation, since those can't be due to a failed CIC?
> >
> > No, I don't like much outsmarting REINDEX with more index drops than
> > it needs to do. And this would not take care of the case with REINDEX
> > INDEX done directly on a toast index.
>
> Well, we could still do both but I get the objection. Then skipping
> invalid toast indexes in reindex_relation looks like the best fix.
PFA a patch to fix the problem using this approach.
I also added isolation tester regression tests. The failure is simulated using
a pg_cancel_backend() on top of pg_stat_activity, using filters on a
specifically set application name and the query text to avoid any unwanted
interaction. I also added a 1s locking delay, to ensure that even slow/CCA
machines can consistently reproduce the failure. Maybe that's not enough, or
maybe testing this scenario is not worth the extra time.