Re: Add an optional timeout clause to isolationtester step. - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Add an optional timeout clause to isolationtester step.
Date
Msg-id 20200307205328.ffidfyo3me46qscn@nol
Whole thread Raw
In response to Re: Add an optional timeout clause to isolationtester step.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Add an optional timeout clause to isolationtester step.
List pgsql-hackers
On Sat, Mar 07, 2020 at 10:46:34AM -0500, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Sat, Mar 07, 2020 at 10:41:42AM +0900, Michael Paquier wrote:
> >> On Fri, Mar 06, 2020 at 02:15:47PM +0100, Julien Rouhaud wrote:
> >>> Here's a patch to add an optional "timeout val" clause to isolationtester's
> >>> step definition.  When used, isolationtester will actively wait on the query
> >>> rather than continuing with the permutation next step, and will issue a cancel
> >>> once the defined timeout is reached.
>
> >> One problem with this approach is that it does address the stability
> >> of the test on very slow machines, and there are some of them in the
> >> buildfarm.
>
> > Looking at timeouts.spec and e.g. a7921f71a3c, it seems that we already chose
> > to fix this problem by having a timeout long enough to statisfy the slower
> > buildfarm members, even when running on fast machines, so I assumed that the
> > same approach could be used here.
>
> The arbitrarily-set timeouts that exist in some of the isolation tests
> are horrid kluges that have caused us lots of headaches in the past
> and no doubt will again in the future.  Aside from occasionally failing
> when a machine is particularly overloaded, they cause the tests to
> take far longer than necessary on decently-fast machines.

Yeah, I have no doubt that it has been a pain, and this patch is clearly not a
bullet-proof solution.

> So ideally
> we'd get rid of those entirely in favor of some more-dynamic approach.
> Admittedly, I have no proposal for what that would be.

The fault injection framework that was previously discussed would cover most of
the usecase I can think of, but that's a way bigger project.

> But adding yet
> more ways to set a (guaranteed-to-be-wrong) timeout seems like the
> wrong direction to be going in.

Fair enough, I'll mark the patch as rejected then.

> What's the actual need that you're trying to deal with?

Testing the correct behavior of non trivial commands, such as CIC/reindex
concurrently, that fails during the execution.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: range_agg
Next
From: Jeff Davis
Date:
Subject: Re: Add LogicalTapeSetExtend() to logtape.c