On 2020-08-24 13:42:35 -0700, Andres Freund wrote:
> Hi,
>
> On 2020-08-23 22:53:18 -0400, Tom Lane wrote:
> > We've seen repeated failures in the tests added by commit 43e084197:
> > ...
> > I dug into this a bit today, and found that I can reproduce the failure
> > reliably by adding a short delay in the right place, as attached.
> >
> > However, after studying the test awhile I have to admit that I do not
> > understand why all these failures look the same, because it seems to
> > me that this test is a house of cards. It *repeatedly* expects that
> > issuing a command to session X will result in session Y reporting
> > some notice before X's command terminates, even though X's command will
> > never meet the conditions for isolationtester to think it's blocked.
> > AFAICS that is nothing but wishful thinking. Why is it that only one of
> > those places has failed so far?
>
> Are there really that many places expecting that? I've not gone through
> this again exhaustively by any means, but most places seem to print
> something only before waiting for a lock.
ISTM the issue at hand isn't so much that X expects something to be
printed by Y before it terminates, but that it expects the next step to
not be executed before Y unlocks. If I understand the wrong output
correctly, what happens is that "controller_print_speculative_locks" is
executed, even though s1 hasn't yet acquired the next lock. Note how the
+s1: NOTICE: blurt_and_lock_123() called for k1 in session 1
+s1: NOTICE: acquiring advisory lock on 2
is *after* "step controller_print_speculative_locks", not just after
"step s2_upsert: <... completed>"
To be clear, there'd still be an issue about whether the NOTICE is
printed before/after the "s2_upsert: <... completed>", but it looks to
me the issue is bigger than that. It's easy enough to add another wait
in s2_upsert, but that doesn't help if the entire scheduling just
continues regardless of there not really being a waiter.
Am I missing something here?
Greetings,
Andres Freund