Re: Add an optional timeout clause to isolationtester step. - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Add an optional timeout clause to isolationtester step.
Date
Msg-id 20200313162520.GA80899@nol
Whole thread Raw
In response to Re: Add an optional timeout clause to isolationtester step.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Add an optional timeout clause to isolationtester step.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Mar 13, 2020 at 10:12:20AM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
>
> > I'm not familiar with those test so I'm probably missing something, but looks
> > like all isolation tests that setup a timeout are doing so to test server side
> > features (deadlock detection, statement and lock timeout).  I'm not sure how
> > adding a client-side facility to detect locks earlier is going to help reducing
> > the server side timeouts?
>
> The point is that those timeouts have to be set long enough for even a
> very slow machine to reach a desired state before the timeout happens;
> on faster machines the test is just uselessly sleeping for a long time,
> because of the fixed timeout.  My thought was that maybe the tests could
> be recast as "watch for session to reach $expected_state and then do
> the next thing", allowing them to be automatically adaptive to the
> machine's speed.  This might require some rather subtle test redesign
> and/or addition of more infrastructure (to allow recognition of the
> desired state and/or taking an appropriate next action).  I'm prepared
> to believe that not much can be done about timeouts.spec in particular,
> but it seems to me that the long delays in the deadlock tests are not
> inherent in what we need to test.


Ah I see.  I'll try to see if that could help the deadlock tests, but for sure
such feature would allow us to get rid of the two pg_sleep(5) in
tuplelock-update.

It seems that for all the possibly interesting cases, what we want to wait on
is an heavyweight lock, which is already what isolationtester detects.  Maybe
we could simply implement something like

step "<name>" [ WAIT UNTIL BLOCKED ] { <SQL> }

without any change to the blocking detection function?


> > For the REINDEX CONCURRENTLY failure test, the problem that needs to be solved
> > isn't detecting that the command is blocked as it's already getting blocked on
> > a heavyweight lock, but being able to reliably cancel a specific query as early
> > as possible, which AFAICS isn't possible with current isolation tester:
>
> Right, it's the same thing of needing to wait till the backend has reached
> a particular state before you do the next thing.
>
> > So we would actually only need something like this to make it work:
> > step "<name>" [ CANCEL IF BLOCKED ] { <SQL }
>
> I continue to resist the idea of hard-wiring this feature to query cancel
> as the action-to-take.  That will more or less guarantee that it's not
> good for anything but this one test case.  I think that the feature
> should have the behavior of "treat this step as blocked once it's reached
> state X", and then you make the next step in the permutation be one that
> issues a query cancel.  (Possibly, using pg_stat_activity and
> pg_cancel_backend for that will be painful enough that we'd want to
> invent separate script syntax that says "send a cancel to session X".
> But that's a separate discussion.)


I agree.  A new step option to kill a session rather than executing sql would
go perfectly with the above new active-wait-for-blocking-state feature.



pgsql-hackers by date:

Previous
From: Thunder
Date:
Subject: Re:Re: Optimize crash recovery
Next
From: Alvaro Herrera
Date:
Subject: Re: Re: Optimize crash recovery