Thread: Continuing instability in insert-conflict-specconflict test

Continuing instability in insert-conflict-specconflict test

From

Tom Lane

Date:

24 August 2020, 02:53:18

We've seen repeated failures in the tests added by commit 43e084197:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2020-08-23%2005%3A46%3A17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2020-08-04%2001%3A05%3A30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-03-14%2019%3A35%3A31
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=caiman&dt=2020-04-01%2004%3A10%3A51
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis&dt=2020-03-10%2003%3A14%3A13
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2020-03-10%2011%3A01%3A49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2020-03-09%2010%3A59%3A43
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2020-03-09%2015%3A52%3A50
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-09%2005%3A20%3A07
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2020-03-09%2003%3A00%3A15
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2020-03-09%2015%3A52%3A53

I dug into this a bit today, and found that I can reproduce the failure
reliably by adding a short delay in the right place, as attached.

However, after studying the test awhile I have to admit that I do not
understand why all these failures look the same, because it seems to
me that this test is a house of cards.  It *repeatedly* expects that
issuing a command to session X will result in session Y reporting
some notice before X's command terminates, even though X's command will
never meet the conditions for isolationtester to think it's blocked.
AFAICS that is nothing but wishful thinking.  Why is it that only one of
those places has failed so far?

            regards, tom lane

diff --git a/src/test/isolation/specs/insert-conflict-specconflict.spec
b/src/test/isolation/specs/insert-conflict-specconflict.spec
index 6028397491..92459047b6 100644
--- a/src/test/isolation/specs/insert-conflict-specconflict.spec
+++ b/src/test/isolation/specs/insert-conflict-specconflict.spec
@@ -28,6 +28,7 @@ setup
         RAISE NOTICE 'blurt_and_lock_4() called for % in session %', $1, current_setting('spec.session')::int;
         RAISE NOTICE 'acquiring advisory lock on 4';
         PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 4);
+        PERFORM pg_sleep(0.1);
     RETURN $1;
     END;$$;

Re: Continuing instability in insert-conflict-specconflict test

From

Andres Freund

Date:

24 August 2020, 20:42:35

Hi,

On 2020-08-23 22:53:18 -0400, Tom Lane wrote:
> We've seen repeated failures in the tests added by commit 43e084197:
> ...
> I dug into this a bit today, and found that I can reproduce the failure
> reliably by adding a short delay in the right place, as attached.
> 
> However, after studying the test awhile I have to admit that I do not
> understand why all these failures look the same, because it seems to
> me that this test is a house of cards.  It *repeatedly* expects that
> issuing a command to session X will result in session Y reporting
> some notice before X's command terminates, even though X's command will
> never meet the conditions for isolationtester to think it's blocked.
>
> AFAICS that is nothing but wishful thinking.  Why is it that only one of
> those places has failed so far?

Are there really that many places expecting that? I've not gone through
this again exhaustively by any means, but most places seem to print
something only before waiting for a lock.

This test is really hairy, which isn't great. But until we have a proper
framework for controlling server side execution, I am not sure how we
better can achieve test coverage for this stuff. And there've been bugs,
so it's worth testing.

Greetings,

Andres Freund

Re: Continuing instability in insert-conflict-specconflict test

From

Andres Freund

Date:

24 August 2020, 21:21:27

On 2020-08-24 13:42:35 -0700, Andres Freund wrote:
> Hi,
> 
> On 2020-08-23 22:53:18 -0400, Tom Lane wrote:
> > We've seen repeated failures in the tests added by commit 43e084197:
> > ...
> > I dug into this a bit today, and found that I can reproduce the failure
> > reliably by adding a short delay in the right place, as attached.
> > 
> > However, after studying the test awhile I have to admit that I do not
> > understand why all these failures look the same, because it seems to
> > me that this test is a house of cards.  It *repeatedly* expects that
> > issuing a command to session X will result in session Y reporting
> > some notice before X's command terminates, even though X's command will
> > never meet the conditions for isolationtester to think it's blocked.

> > AFAICS that is nothing but wishful thinking.  Why is it that only one of
> > those places has failed so far?
> 
> Are there really that many places expecting that? I've not gone through
> this again exhaustively by any means, but most places seem to print
> something only before waiting for a lock.

ISTM the issue at hand isn't so much that X expects something to be
printed by Y before it terminates, but that it expects the next step to
not be executed before Y unlocks. If I understand the wrong output
correctly, what happens is that "controller_print_speculative_locks" is
executed, even though s1 hasn't yet acquired the next lock. Note how the
+s1: NOTICE:  blurt_and_lock_123() called for k1 in session 1
+s1: NOTICE:  acquiring advisory lock on 2
is *after* "step controller_print_speculative_locks", not just after
"step s2_upsert: <... completed>"

To be clear, there'd still be an issue about whether the NOTICE is
printed before/after the "s2_upsert: <... completed>", but it looks to
me the issue is bigger than that.  It's easy enough to add another wait
in s2_upsert, but that doesn't help if the entire scheduling just
continues regardless of there not really being a waiter.

Am I missing something here?

Greetings,

Andres Freund

Re: Continuing instability in insert-conflict-specconflict test

From

Tom Lane

Date:

25 August 2020, 01:34:51

Andres Freund <andres@anarazel.de> writes:
> ISTM the issue at hand isn't so much that X expects something to be
> printed by Y before it terminates, but that it expects the next step to
> not be executed before Y unlocks. If I understand the wrong output
> correctly, what happens is that "controller_print_speculative_locks" is
> executed, even though s1 hasn't yet acquired the next lock.

That's one way to look at it perhaps.

I've spent the day fooling around with a re-implementation of
isolationtester that waits for all its controlled sessions to quiesce
(either wait for client input, or block on a lock held by another
session) before moving on to the next step.  That was not a feasible
approach before we had the wait_event infrastructure, but it's
seeming like it might be workable now.  Still have a few issues to
sort out though ...

            regards, tom lane

Re: Continuing instability in insert-conflict-specconflict test

From

Tom Lane

Date:

25 August 2020, 16:04:41

I wrote:
> I've spent the day fooling around with a re-implementation of
> isolationtester that waits for all its controlled sessions to quiesce
> (either wait for client input, or block on a lock held by another
> session) before moving on to the next step.  That was not a feasible
> approach before we had the wait_event infrastructure, but it's
> seeming like it might be workable now.  Still have a few issues to
> sort out though ...

I wasted a good deal of time on this idea, and eventually concluded
that it's a dead end, because there is an unremovable race condition.
Namely, that even if the isolationtester's observer backend has
observed that test session X has quiesced according to its
wait_event_info, it is possible for the report of that fact to arrive
at the isolationtester client process before test session X's output
does.

It's quite obvious how that might happen if the isolationtester is
on a different machine than the PG server --- just imagine a dropped
packet in X's output that has to be retransmitted.  You might think
it shouldn't happen within a single machine, but I'm seeing results
that I cannot explain any other way (on an 8-core RHEL8 box).
It appears to not be particularly rare, either.

> Andres Freund <andres@anarazel.de> writes:
>> ISTM the issue at hand isn't so much that X expects something to be
>> printed by Y before it terminates, but that it expects the next step to
>> not be executed before Y unlocks. If I understand the wrong output
>> correctly, what happens is that "controller_print_speculative_locks" is
>> executed, even though s1 hasn't yet acquired the next lock.

The problem as I'm now understanding it is that
insert-conflict-specconflict.spec issues multiple commands in sequence
to its "controller" session, and expects that NOTICE outputs from a
different test session will arrive at a determinate point in that
sequence.  In practice that's not guaranteed, because (a) the other
test session might not send the NOTICE soon enough --- as my modified
specfile proves --- and (b) even if the NOTICE is timely sent, the
kernel will not guarantee timely receipt.  We could fix (a) by
introducing some explicit interlock between the controller and test
sessions, but (b) is a killer.

I think what we have to do to salvage this test is to get rid of the
use of NOTICE outputs, and instead have the test functions insert
log records into some table, which we can inspect after the fact
to verify that things happened as we expect.

            regards, tom lane

Re: Continuing instability in insert-conflict-specconflict test

From

Andrew Dunstan

Date:

25 August 2020, 17:03:37

On 8/24/20 4:42 PM, Andres Freund wrote:
>
> This test is really hairy, which isn't great. But until we have a proper
> framework for controlling server side execution, I am not sure how we
> better can achieve test coverage for this stuff. And there've been bugs,
> so it's worth testing.
>


What would the framework look like?


cheers


andrew


-- 

Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Continuing instability in insert-conflict-specconflict test

From

Asim Praveen

Date:

31 August 2020, 15:10:46

Let me (rather shamelessly) extract a couple of patches from the
patch set that was already shared in the fault injection framework
proposal [1].

The first patch incorporates a new syntax in isolation spec grammar to
explicitly mark a step that is expected to block (due to reasons other
than locks). E.g.

permutation step1 step2& step3

The “&” suffix indicates that step2 is expected to block and isolation
tester should move on to step3 without waiting for step2 to finish.

The second patch implements the insert-conflict scenario that is being
discussed here - one session waits (using a “suspend” fault) after
inserting a tuple into the heap relation but before updating the
index. Another session concurrently inserts a conflicting tuple in
the heap and the index, and commits. Then the fault is reset so that
the blocked session resumes and detects conflict when updating the
index.

> On 25-Aug-2020, at 9:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
>> I've spent the day fooling around with a re-implementation of
>> isolationtester that waits for all its controlled sessions to quiesce
>> (either wait for client input, or block on a lock held by another
>> session) before moving on to the next step. That was not a feasible
>> approach before we had the wait_event infrastructure, but it's
>> seeming like it might be workable now. Still have a few issues to
>> sort out though ...
>
> I wasted a good deal of time on this idea, and eventually concluded
> that it's a dead end, because there is an unremovable race condition.
> Namely, that even if the isolationtester's observer backend has
> observed that test session X has quiesced according to its
> wait_event_info, it is possible for the report of that fact to arrive
> at the isolationtester client process before test session X's output
> does.

The attached test evades this race condition by not depending on any
output from the blocked session X. It queries status of the injected
fault to ascertain that a specific point in the code was reached
during execution.

>
> I think what we have to do to salvage this test is to get rid of the
> use of NOTICE outputs, and instead have the test functions insert
> log records into some table, which we can inspect after the fact
> to verify that things happened as we expect.
>

+1 to getting rid of NOTICE outputs.

Please refer to https://github.com/asimrp/postgres/tree/faultinjector
for the full patch set proposed in [1] that is now rebased against the
latest master.

Asim

[1] https://www.postgresql.org/message-id/flat/CANXE4Tc%2BRYRC48%3DdKYn1PvAjE26Ew4hh%3DXUjBRGj%3DJ9eob-S6g%40mail.gmail.com#cd02fa3b461102e97bcdc97e62dcc6d3

Attachment

Re: Continuing instability in insert-conflict-specconflict test

From

Noah Misch

Date:

13 June 2021, 07:34:07

The test material added in commit 43e0841 continues to yield buildfarm
failures.  Failures new since the rest of this thread:

 damselfly    │ 2021-02-02 10:19:15 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2021-02-02%2010%3A19%3A15
 drongo       │ 2021-02-05 01:13:10 │ REL_13_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2021-02-05%2001%3A13%3A10
 lorikeet     │ 2021-03-05 21:30:13 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2021-03-05%2021%3A30%3A13
 lorikeet     │ 2021-03-16 08:28:36 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2021-03-16%2008%3A28%3A36
 macaque      │ 2021-03-21 10:14:52 │ REL_13_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=macaque&dt=2021-03-21%2010%3A14%3A52
 walleye      │ 2021-03-25 05:00:44 │ REL_13_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-03-25%2005%3A00%3A44
 sungazer     │ 2021-04-23 21:52:31 │ REL_13_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2021-04-23%2021%3A52%3A31
 gharial      │ 2021-04-30 06:08:36 │ REL_13_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-04-30%2006%3A08%3A36
 walleye      │ 2021-05-05 17:00:41 │ REL_13_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-05-05%2017%3A00%3A41
 gharial      │ 2021-05-05 22:35:33 │ REL_13_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-05-05%2022%3A35%3A33

On Tue, Aug 25, 2020 at 12:04:41PM -0400, Tom Lane wrote:
> I think what we have to do to salvage this test is to get rid of the
> use of NOTICE outputs, and instead have the test functions insert
> log records into some table, which we can inspect after the fact
> to verify that things happened as we expect.

That sounds promising.  Are those messages important for observing server
bugs, or are they for debugging/modifying the test itself?  If the latter, one
could just change the messages to LOG.  Any of the above won't solve things
completely, because 3 of the 21 failures have diffs in the pg_locks output:

 dory         │ 2020-03-14 19:35:31 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-03-14%2019%3A35%3A31
 walleye      │ 2021-03-25 05:00:44 │ REL_13_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-03-25%2005%3A00%3A44
 walleye      │ 2021-05-05 17:00:41 │ REL_13_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-05-05%2017%3A00%3A41

Perhaps the pg_locks query should poll until pg_locks has the expected rows.
Or else poll until all test sessions are waiting or idle.

Re: Continuing instability in insert-conflict-specconflict test

From

Tom Lane

Date:

13 June 2021, 20:48:48

Noah Misch <noah@leadboat.com> writes:
> The test material added in commit 43e0841 continues to yield buildfarm
> failures.

Yeah.  It's only a relatively small fraction of the total volume of
isolation-test failures, so I'm not sure it's worth expending a
whole lot of effort on this issue by itself.

> On Tue, Aug 25, 2020 at 12:04:41PM -0400, Tom Lane wrote:
>> I think what we have to do to salvage this test is to get rid of the
>> use of NOTICE outputs, and instead have the test functions insert
>> log records into some table, which we can inspect after the fact
>> to verify that things happened as we expect.

> That sounds promising.  Are those messages important for observing server
> bugs, or are they for debugging/modifying the test itself? If the latter, one
> could just change the messages to LOG.

I think they are important, because they show that the things we expect
to happen occur when we expect them to happen.

I experimented with replacing the RAISE NOTICEs with INSERTs, and ran
into two problems:

1. You can't do an INSERT in an IMMUTABLE function.  This is easily
worked around by putting the INSERT in a separate, volatile function.
That's cheating like mad of course, but so is the rest of the stuff
this test does in "immutable" functions.

2. The inserted events don't become visible from the outside until the
respective session commits.  This seems like an absolute show-stopper.
After the fact, we can see that the events happened in the expected
relative order; but we don't have proof that they happened in the right
order relative to the actions visible in the test output file.

> ... Any of the above won't solve things
> completely, because 3 of the 21 failures have diffs in the pg_locks output:

Yeah, it looks like the issue there is that session 2 reports completion
of its step before session 1 has a chance to make progress after obtaining
the lock.  This seems to me to be closely related to the race conditions
I described upthread.

[ thinks for awhile ... ]

I wonder whether we could do better with something along these lines:

* Adjust the test script's functions to emit a NOTICE *after* acquiring
a lock, not before.

* Annotate permutations with something along the lines of "expect N
NOTICE outputs before allowing this step to be considered complete",
which we'd attach to the unlock steps.

This idea is only half baked at present, but maybe there's something
to work with there.  If it works, maybe we could improve the other
test cases that are always pseudo-failing in a similar way.  For
example, in the deadlock tests, annotate steps with "expect step
Y to finish before step X".

            regards, tom lane

Re: Continuing instability in insert-conflict-specconflict test

From

Noah Misch

Date:

13 June 2021, 21:29:43

On Sun, Jun 13, 2021 at 04:48:48PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Tue, Aug 25, 2020 at 12:04:41PM -0400, Tom Lane wrote:
> >> I think what we have to do to salvage this test is to get rid of the
> >> use of NOTICE outputs, and instead have the test functions insert
> >> log records into some table, which we can inspect after the fact
> >> to verify that things happened as we expect.
> 
> > That sounds promising.  Are those messages important for observing server
> > bugs, or are they for debugging/modifying the test itself? If the latter, one
> > could just change the messages to LOG.
> 
> I think they are important, because they show that the things we expect
> to happen occur when we expect them to happen.
> 
> I experimented with replacing the RAISE NOTICEs with INSERTs, and ran
> into two problems:
> 
> 1. You can't do an INSERT in an IMMUTABLE function.  This is easily
> worked around by putting the INSERT in a separate, volatile function.
> That's cheating like mad of course, but so is the rest of the stuff
> this test does in "immutable" functions.
> 
> 2. The inserted events don't become visible from the outside until the
> respective session commits.  This seems like an absolute show-stopper.
> After the fact, we can see that the events happened in the expected
> relative order; but we don't have proof that they happened in the right
> order relative to the actions visible in the test output file.

One could send the inserts over dblink, I suppose.

> > ... Any of the above won't solve things
> > completely, because 3 of the 21 failures have diffs in the pg_locks output:

> * Adjust the test script's functions to emit a NOTICE *after* acquiring
> a lock, not before.

I suspect that particular lock acquisition merely unblocks the processing that
reaches the final lock state expected by the test.  So, ...

> * Annotate permutations with something along the lines of "expect N
> NOTICE outputs before allowing this step to be considered complete",
> which we'd attach to the unlock steps.

... I don't expect this to solve $SUBJECT.  It could be a separately-useful
feature, though.

> This idea is only half baked at present, but maybe there's something
> to work with there.  If it works, maybe we could improve the other
> test cases that are always pseudo-failing in a similar way.  For
> example, in the deadlock tests, annotate steps with "expect step
> Y to finish before step X".

Yeah, a special permutation list entry like PQgetResult(s8) could solve
failures like
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-06-11%2017%3A13%3A44

Incidentally, I have a different idle patch relevant to deadlock test failures
like that.  Let me see if it has anything useful.

Re: Continuing instability in insert-conflict-specconflict test

From

Tom Lane

Date:

13 June 2021, 22:09:20

Noah Misch <noah@leadboat.com> writes:
> On Sun, Jun 13, 2021 at 04:48:48PM -0400, Tom Lane wrote:
>> * Adjust the test script's functions to emit a NOTICE *after* acquiring
>> a lock, not before.

> I suspect that particular lock acquisition merely unblocks the processing that
> reaches the final lock state expected by the test.  So, ...

Ah, you're probably right.

>> * Annotate permutations with something along the lines of "expect N
>> NOTICE outputs before allowing this step to be considered complete",
>> which we'd attach to the unlock steps.

> ... I don't expect this to solve $SUBJECT.  It could be a separately-useful
> feature, though.

I think it would solve it.  In the examples at hand, where we have

@@ -377,8 +377,6 @@
 pg_advisory_unlock

 t
-s1: NOTICE:  blurt_and_lock_123() called for k1 in session 1
-s1: NOTICE:  acquiring advisory lock on 2
 step s2_upsert: <... completed>
 step controller_print_speculative_locks:
     SELECT pa.application_name, locktype, mode, granted

and then those notices show up sometime later, I'm hypothesizing
that the actions did happen timely, but the actual delivery of
those packets to the isolationtester client did not.  If we
annotated step s2_upsert with a marker to the effect of "wait
for 2 NOTICEs from session 1 before considering this step done",
we could resolve that race condition.  Admittedly, this is putting
a thumb on the scales a little bit, but it's hard to see how to
deal with inconsistent TCP delivery delays without that.

(BTW, I find that removing the pq_flush() call at the bottom of
send_message_to_frontend produces this failure and a bunch of
other similar ones.)

> Yeah, a special permutation list entry like PQgetResult(s8) could solve
> failures like
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-06-11%2017%3A13%3A44

Right.  I'm visualizing it more like annotating s7a8 as requiring
s8a1 to complete first -- or vice versa, either would stabilize
that test result I think.

We might be able to get rid of the stuff about concurrent step
completion in isolationtester.c if we required the spec files
to use annotations to force a deterministic step completion
order in all such cases.

            regards, tom lane

Re: Continuing instability in insert-conflict-specconflict test

From

Noah Misch

Date:

13 June 2021, 22:22:12

On Sun, Jun 13, 2021 at 06:09:20PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Sun, Jun 13, 2021 at 04:48:48PM -0400, Tom Lane wrote:
> >> * Adjust the test script's functions to emit a NOTICE *after* acquiring
> >> a lock, not before.
> 
> > I suspect that particular lock acquisition merely unblocks the processing that
> > reaches the final lock state expected by the test.  So, ...
> 
> Ah, you're probably right.
> 
> >> * Annotate permutations with something along the lines of "expect N
> >> NOTICE outputs before allowing this step to be considered complete",
> >> which we'd attach to the unlock steps.
> 
> > ... I don't expect this to solve $SUBJECT.  It could be a separately-useful
> > feature, though.
> 
> I think it would solve it.  In the examples at hand, where we have
> 
> @@ -377,8 +377,6 @@
>  pg_advisory_unlock
>  
>  t              
> -s1: NOTICE:  blurt_and_lock_123() called for k1 in session 1
> -s1: NOTICE:  acquiring advisory lock on 2
>  step s2_upsert: <... completed>
>  step controller_print_speculative_locks: 
>      SELECT pa.application_name, locktype, mode, granted

It would solve that one particular diff.  I meant that it wouldn't solve the
aforementioned pg_locks diffs:

 dory         │ 2020-03-14 19:35:31 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-03-14%2019%3A35%3A31
 walleye      │ 2021-03-25 05:00:44 │ REL_13_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-03-25%2005%3A00%3A44
 walleye      │ 2021-05-05 17:00:41 │ REL_13_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-05-05%2017%3A00%3A41

> We might be able to get rid of the stuff about concurrent step
> completion in isolationtester.c if we required the spec files
> to use annotations to force a deterministic step completion
> order in all such cases.

Yeah.  If we're willing to task spec authors with that, the test program can't
then guess wrong under unusual timing.

Re: Continuing instability in insert-conflict-specconflict test

From

Andres Freund

Date:

13 June 2021, 23:49:04

Hi,

On 2021-06-13 15:22:12 -0700, Noah Misch wrote:
> On Sun, Jun 13, 2021 at 06:09:20PM -0400, Tom Lane wrote:
> > We might be able to get rid of the stuff about concurrent step
> > completion in isolationtester.c if we required the spec files
> > to use annotations to force a deterministic step completion
> > order in all such cases.
> 
> Yeah.  If we're willing to task spec authors with that, the test program can't
> then guess wrong under unusual timing.

I think it'd make it *easier* for spec authors. Right now one needs to
find some way to get a consistent ordering, which is often hard and
complicates tests way more than specifying an explicit ordering
would. And it's often unreliable, as evidenced here and in plenty other
tests.

Greetings,

Andres Freund

Re: Continuing instability in insert-conflict-specconflict test

From

Gavin Flower

Date:

14 June 2021, 00:23:24

On 14/06/21 11:49 am, Andres Freund wrote:
> Hi,
>
> On 2021-06-13 15:22:12 -0700, Noah Misch wrote:
>> On Sun, Jun 13, 2021 at 06:09:20PM -0400, Tom Lane wrote:
>>> We might be able to get rid of the stuff about concurrent step
>>> completion in isolationtester.c if we required the spec files
>>> to use annotations to force a deterministic step completion
>>> order in all such cases.
>> Yeah.  If we're willing to task spec authors with that, the test program can't
>> then guess wrong under unusual timing.
> I think it'd make it *easier* for spec authors. Right now one needs to
> find some way to get a consistent ordering, which is often hard and
> complicates tests way more than specifying an explicit ordering
> would. And it's often unreliable, as evidenced here and in plenty other
> tests.
>
> Greetings,
>
> Andres Freund

How about adding a keyword like 'DETERMINISTIC' to the top level SELECT, 
the idea being the output would be deterministic (given the same table 
values after filtering etc), but not necessarily in any particular 
order?  So pg could decide the optimum way to achieve that which may not 
necessarily need a sort.


Cheers,
Gavin

Re: Continuing instability in insert-conflict-specconflict test

From

Noah Misch

Date:

14 June 2021, 01:46:15

On Sun, Jun 13, 2021 at 04:49:04PM -0700, Andres Freund wrote:
> On 2021-06-13 15:22:12 -0700, Noah Misch wrote:
> > On Sun, Jun 13, 2021 at 06:09:20PM -0400, Tom Lane wrote:
> > > We might be able to get rid of the stuff about concurrent step
> > > completion in isolationtester.c if we required the spec files
> > > to use annotations to force a deterministic step completion
> > > order in all such cases.
> > 
> > Yeah.  If we're willing to task spec authors with that, the test program can't
> > then guess wrong under unusual timing.
> 
> I think it'd make it *easier* for spec authors. Right now one needs to
> find some way to get a consistent ordering, which is often hard and
> complicates tests way more than specifying an explicit ordering
> would. And it's often unreliable, as evidenced here and in plenty other
> tests.

Fine with me.  Even if it weren't easier for spec authors, it shifts efforts
to spec authors and away from buildfarm observers, which is a good thing.