On 2020/09/09 2:53, Andres Freund wrote:
> Hi,
>
> On 2020-09-08 16:44:17 +1200, Thomas Munro wrote:
>> On Tue, Sep 8, 2020 at 4:11 PM Andres Freund <andres@anarazel.de> wrote:
>>> At first I was very confused as to why none of the existing tests have
>>> found this significant issue. But after thinking about it for a minute
>>> that's because they all use psql, and largely separate psql invocations
>>> for each query :(. Which means that there's no cached snapshot around...
>>
>> I prototyped a TAP test patch that could maybe do the sort of thing
>> you need, in patch 0006 over at [1]. Later versions of that patch set
>> dropped it, because I figured out how to use the isolation tester
>> instead, but I guess you can't do that for a standby test (at least
>> not until someone teaches the isolation tester to support multi-node
>> schedules, something that would be extremely useful...).
>
> Unfortunately proper multi-node isolationtester test basically is
> equivalent to building a global lock graph. I think, at least? Including
> a need to be able to correlate connections with their locks between the
> nodes.
>
> But for something like the bug at hand it'd probably sufficient to just
> "hack" something with dblink. In session 1) insert a row on the primary
> using dblink, return the LSN, wait for the LSN to have replicated and
> finally in session 2) check for row visibility.
The attached seems to do the trick.
Regards
Ian Barwick
--
Ian Barwick https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services