Re: Continuing instability in insert-conflict-specconflict test - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Continuing instability in insert-conflict-specconflict test
Date
Msg-id 20200824212127.jbf6zulrvmeyvcnu@alap3.anarazel.de
Whole thread Raw
In response to Re: Continuing instability in insert-conflict-specconflict test  (Andres Freund <andres@anarazel.de>)
Responses Re: Continuing instability in insert-conflict-specconflict test  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 2020-08-24 13:42:35 -0700, Andres Freund wrote:
> Hi,
> 
> On 2020-08-23 22:53:18 -0400, Tom Lane wrote:
> > We've seen repeated failures in the tests added by commit 43e084197:
> > ...
> > I dug into this a bit today, and found that I can reproduce the failure
> > reliably by adding a short delay in the right place, as attached.
> > 
> > However, after studying the test awhile I have to admit that I do not
> > understand why all these failures look the same, because it seems to
> > me that this test is a house of cards.  It *repeatedly* expects that
> > issuing a command to session X will result in session Y reporting
> > some notice before X's command terminates, even though X's command will
> > never meet the conditions for isolationtester to think it's blocked.

> > AFAICS that is nothing but wishful thinking.  Why is it that only one of
> > those places has failed so far?
> 
> Are there really that many places expecting that? I've not gone through
> this again exhaustively by any means, but most places seem to print
> something only before waiting for a lock.

ISTM the issue at hand isn't so much that X expects something to be
printed by Y before it terminates, but that it expects the next step to
not be executed before Y unlocks. If I understand the wrong output
correctly, what happens is that "controller_print_speculative_locks" is
executed, even though s1 hasn't yet acquired the next lock. Note how the
+s1: NOTICE:  blurt_and_lock_123() called for k1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
is *after* "step controller_print_speculative_locks", not just after
"step s2_upsert: <... completed>"

To be clear, there'd still be an issue about whether the NOTICE is
printed before/after the "s2_upsert: <... completed>", but it looks to
me the issue is bigger than that.  It's easy enough to add another wait
in s2_upsert, but that doesn't help if the entire scheduling just
continues regardless of there not really being a waiter.

Am I missing something here?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Continuing instability in insert-conflict-specconflict test
Next
From: Andres Freund
Date:
Subject: LWLockAcquire and LockBuffer mode argument