Re: Add an optional timeout clause to isolationtester step. - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Add an optional timeout clause to isolationtester step.
Date
Msg-id 20200310135336.zi3mgmq6fub3jfek@nol
Whole thread Raw
In response to Re: Add an optional timeout clause to isolationtester step.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Add an optional timeout clause to isolationtester step.  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Tue, Mar 10, 2020 at 12:09:12AM -0400, Tom Lane wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> > On Mon, Mar 09, 2020 at 10:32:27PM -0400, Tom Lane wrote:
> >> It strikes me to wonder whether we could improve matters by teaching
> >> isolationtester to watch for particular values in a connected backend's
> >> pg_stat_activity.wait_event_type/wait_event columns.  Those columns
> >> didn't exist when isolationtester was designed, IIRC, so it's not
> >> surprising that they're not used in the current design.  But we could
> >> use them perhaps to detect that a backend has arrived at some state
> >> that's not a heavyweight-lock-wait state.
>
> > Interesting idea.  So that would be basically an equivalent of
> > PostgresNode::poll_query_until but for the isolation tester?
>
> No, more like the existing isolationtester wait query, which watches
> for something being blocked on a heavyweight lock.  Right now, that
> one depends on a bespoke function pg_isolation_test_session_is_blocked(),
> but it used to be a query on pg_stat_activity/pg_locks.

Ah interesting indeed!

> > In short
> > we gain a meta-command that runs a SELECT query that waits until the
> > query defined in the command returns true.  The polling interval may
> > be tricky to set though.
>
> I think it'd be just the same as the polling interval for the existing
> wait query.  We'd have to have some way to mark a script step to say
> what to check to decide that it's blocked ...

So basically we could just change pg_isolation_test_session_is_blocked() to
also return the wait_event_type and wait_event, and adding something like

step "<name>" { SQL } [ cancel on "<wait_event_type>" "<wait_event>" ]

to the step definition should be enough.  I'm attaching a POC patch for that.
On my laptop, the full test now complete in about 400ms.

FTR the REINDEX TABLE CONCURRENTLY case is eventually locked on a virtualxid,
I'm not sure if that's could lead to too early cancellation.

Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: [PATCH] Erase the distinctClause if the result is unique by definition
Next
From: David Steele
Date:
Subject: Re: WIP: System Versioned Temporal Table