Hi,
On 2020-09-08 16:44:17 +1200, Thomas Munro wrote:
> On Tue, Sep 8, 2020 at 4:11 PM Andres Freund <andres@anarazel.de> wrote:
> > At first I was very confused as to why none of the existing tests have
> > found this significant issue. But after thinking about it for a minute
> > that's because they all use psql, and largely separate psql invocations
> > for each query :(. Which means that there's no cached snapshot around...
>
> I prototyped a TAP test patch that could maybe do the sort of thing
> you need, in patch 0006 over at [1]. Later versions of that patch set
> dropped it, because I figured out how to use the isolation tester
> instead, but I guess you can't do that for a standby test (at least
> not until someone teaches the isolation tester to support multi-node
> schedules, something that would be extremely useful...).
Unfortunately proper multi-node isolationtester test basically is
equivalent to building a global lock graph. I think, at least? Including
a need to be able to correlate connections with their locks between the
nodes.
But for something like the bug at hand it'd probably sufficient to just
"hack" something with dblink. In session 1) insert a row on the primary
using dblink, return the LSN, wait for the LSN to have replicated and
finally in session 2) check for row visibility.
Greetings,
Andres Freund