On Sat, Mar 07, 2020 at 10:41:42AM +0900, Michael Paquier wrote:
> On Fri, Mar 06, 2020 at 02:15:47PM +0100, Julien Rouhaud wrote:
> > Here's a patch to add an optional "timeout val" clause to isolationtester's
> > step definition. When used, isolationtester will actively wait on the query
> > rather than continuing with the permutation next step, and will issue a cancel
> > once the defined timeout is reached. I also added as a POC the previous
> > regression tests for invalid TOAST indexes, updated to use this new
> > infrastructure (which won't pass as long as the original bug for invalid TOAST
> > indexes isn't fixed).
>
> One problem with this approach is that it does address the stability
> of the test on very slow machines, and there are some of them in the
> buildfarm. Taking your patch, I can make the test fail by applying
> the following sleep because the query would be cancelled before some
> of the indexes are marked as invalid:
> --- a/src/backend/commands/indexcmds.c
> +++ b/src/backend/commands/indexcmds.c
> @@ -3046,6 +3046,8 @@ ReindexRelationConcurrently(Oid relationOid, int
> options)
> CommitTransactionCommand();
> StartTransactionCommand();
>
> + pg_usleep(100000L * 10); /* 10s */
> +
> /*
> * Phase 2 of REINDEX CONCURRENTLY
>
> Another problem is that on faster machines the test is slow because of
> the timeout used. What are your thoughts about having instead a
> cancel meta-command instead?
Looking at timeouts.spec and e.g. a7921f71a3c, it seems that we already chose
to fix this problem by having a timeout long enough to statisfy the slower
buildfarm members, even when running on fast machines, so I assumed that the
same approach could be used here.
I agree that the 1s timeout I used is maybe too low, but that's easy enough to
change. Another point is that it's possible to have a close behavior without
this patch by using a statement_timeout (the active wait does change things
though), but the spec files would be more verbose.