Re: Add an optional timeout clause to isolationtester step. - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Add an optional timeout clause to isolationtester step.
Date
Msg-id 20200313090450.6crpgoula5nrgadl@nol
Whole thread Raw
In response to Re: Add an optional timeout clause to isolationtester step.  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Add an optional timeout clause to isolationtester step.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Mar 11, 2020 at 05:52:54PM -0300, Alvaro Herrera wrote:
> On 2020-Mar-11, Tom Lane wrote:
>
> > We could re-use Julien's ideas about the isolation spec syntax by
> > making it be, roughly,
> >
> > step "<name>" { <SQL> } [ blocked if "<wait_event_type>" "<wait_event>" ]
> >
> > and then those items would need to be passed as parameters of the prepared
> > query.
>
> I think for test readability's sake, it'd be better to put the BLOCKED
> IF clause ahead of the SQL, so you can write it in the same line and let
> the SQL flow to the next one:
>
> STEP "long_select" BLOCKED IF "lwlock" "ClogControlLock"
>   { select foo from pg_class where ... some more long clauses ... }
>
> otherwise I think a step would require more lines to write.
>
> > I'd like to see an attempt to rewrite some of the existing
> > timeout-dependent test cases to use this facility instead of
> > long timeouts.  If we could get rid of the timeouts in the
> > deadlock tests, that'd go a long way towards showing that this
> > idea is actually any good.
>
> +1.  Those long timeouts are annoying enough that infrastructure to make
> a run shorter in normal circumstances might be sufficient justification
> for this patch ...


I'm not familiar with those test so I'm probably missing something, but looks
like all isolation tests that setup a timeout are doing so to test server side
features (deadlock detection, statement and lock timeout).  I'm not sure how
adding a client-side facility to detect locks earlier is going to help reducing
the server side timeouts?

For the REINDEX CONCURRENTLY failure test, the problem that needs to be solved
isn't detecting that the command is blocked as it's already getting blocked on
a heavyweight lock, but being able to reliably cancel a specific query as early
as possible, which AFAICS isn't possible with current isolation tester:

- either we reliably cancel the query using a statement timeout, but we'll make
  it slow for everyone
- either we send a blind pg_cancel_backend() hoping that we don't catch
  anything else (and also make it slower than required to make sure that it's
  not canceled to early)

So we would actually only need something like this to make it work:

step "<name>" [ CANCEL IF BLOCKED ] { <SQL }



pgsql-hackers by date:

Previous
From: Kuntal Ghosh
Date:
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Next
From: Kuntal Ghosh
Date:
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager