On Tue, Mar 10, 2020 at 12:09:12AM -0400, Tom Lane wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> > On Mon, Mar 09, 2020 at 10:32:27PM -0400, Tom Lane wrote:
> >> It strikes me to wonder whether we could improve matters by teaching
> >> isolationtester to watch for particular values in a connected backend's
> >> pg_stat_activity.wait_event_type/wait_event columns. Those columns
> >> didn't exist when isolationtester was designed, IIRC, so it's not
> >> surprising that they're not used in the current design. But we could
> >> use them perhaps to detect that a backend has arrived at some state
> >> that's not a heavyweight-lock-wait state.
>
> > Interesting idea. So that would be basically an equivalent of
> > PostgresNode::poll_query_until but for the isolation tester?
>
> No, more like the existing isolationtester wait query, which watches
> for something being blocked on a heavyweight lock. Right now, that
> one depends on a bespoke function pg_isolation_test_session_is_blocked(),
> but it used to be a query on pg_stat_activity/pg_locks.
Ah interesting indeed!
> > In short
> > we gain a meta-command that runs a SELECT query that waits until the
> > query defined in the command returns true. The polling interval may
> > be tricky to set though.
>
> I think it'd be just the same as the polling interval for the existing
> wait query. We'd have to have some way to mark a script step to say
> what to check to decide that it's blocked ...
So basically we could just change pg_isolation_test_session_is_blocked() to
also return the wait_event_type and wait_event, and adding something like
step "<name>" { SQL } [ cancel on "<wait_event_type>" "<wait_event>" ]
to the step definition should be enough. I'm attaching a POC patch for that.
On my laptop, the full test now complete in about 400ms.
FTR the REINDEX TABLE CONCURRENTLY case is eventually locked on a virtualxid,
I'm not sure if that's could lead to too early cancellation.