Thread: Continuing instability in insert-conflict-specconflict test
We've seen repeated failures in the tests added by commit 43e084197: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2020-08-23%2005%3A46%3A17 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2020-08-04%2001%3A05%3A30 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-03-14%2019%3A35%3A31 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=caiman&dt=2020-04-01%2004%3A10%3A51 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis&dt=2020-03-10%2003%3A14%3A13 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2020-03-10%2011%3A01%3A49 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2020-03-09%2010%3A59%3A43 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2020-03-09%2015%3A52%3A50 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-09%2005%3A20%3A07 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2020-03-09%2003%3A00%3A15 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2020-03-09%2015%3A52%3A53 I dug into this a bit today, and found that I can reproduce the failure reliably by adding a short delay in the right place, as attached. However, after studying the test awhile I have to admit that I do not understand why all these failures look the same, because it seems to me that this test is a house of cards. It *repeatedly* expects that issuing a command to session X will result in session Y reporting some notice before X's command terminates, even though X's command will never meet the conditions for isolationtester to think it's blocked. AFAICS that is nothing but wishful thinking. Why is it that only one of those places has failed so far? regards, tom lane diff --git a/src/test/isolation/specs/insert-conflict-specconflict.spec b/src/test/isolation/specs/insert-conflict-specconflict.spec index 6028397491..92459047b6 100644 --- a/src/test/isolation/specs/insert-conflict-specconflict.spec +++ b/src/test/isolation/specs/insert-conflict-specconflict.spec @@ -28,6 +28,7 @@ setup RAISE NOTICE 'blurt_and_lock_4() called for % in session %', $1, current_setting('spec.session')::int; RAISE NOTICE 'acquiring advisory lock on 4'; PERFORM pg_advisory_xact_lock(current_setting('spec.session')::int, 4); + PERFORM pg_sleep(0.1); RETURN $1; END;$$;
Hi, On 2020-08-23 22:53:18 -0400, Tom Lane wrote: > We've seen repeated failures in the tests added by commit 43e084197: > ... > I dug into this a bit today, and found that I can reproduce the failure > reliably by adding a short delay in the right place, as attached. > > However, after studying the test awhile I have to admit that I do not > understand why all these failures look the same, because it seems to > me that this test is a house of cards. It *repeatedly* expects that > issuing a command to session X will result in session Y reporting > some notice before X's command terminates, even though X's command will > never meet the conditions for isolationtester to think it's blocked. > > AFAICS that is nothing but wishful thinking. Why is it that only one of > those places has failed so far? Are there really that many places expecting that? I've not gone through this again exhaustively by any means, but most places seem to print something only before waiting for a lock. This test is really hairy, which isn't great. But until we have a proper framework for controlling server side execution, I am not sure how we better can achieve test coverage for this stuff. And there've been bugs, so it's worth testing. Greetings, Andres Freund
On 2020-08-24 13:42:35 -0700, Andres Freund wrote: > Hi, > > On 2020-08-23 22:53:18 -0400, Tom Lane wrote: > > We've seen repeated failures in the tests added by commit 43e084197: > > ... > > I dug into this a bit today, and found that I can reproduce the failure > > reliably by adding a short delay in the right place, as attached. > > > > However, after studying the test awhile I have to admit that I do not > > understand why all these failures look the same, because it seems to > > me that this test is a house of cards. It *repeatedly* expects that > > issuing a command to session X will result in session Y reporting > > some notice before X's command terminates, even though X's command will > > never meet the conditions for isolationtester to think it's blocked. > > AFAICS that is nothing but wishful thinking. Why is it that only one of > > those places has failed so far? > > Are there really that many places expecting that? I've not gone through > this again exhaustively by any means, but most places seem to print > something only before waiting for a lock. ISTM the issue at hand isn't so much that X expects something to be printed by Y before it terminates, but that it expects the next step to not be executed before Y unlocks. If I understand the wrong output correctly, what happens is that "controller_print_speculative_locks" is executed, even though s1 hasn't yet acquired the next lock. Note how the +s1: NOTICE: blurt_and_lock_123() called for k1 in session 1 +s1: NOTICE: acquiring advisory lock on 2 is *after* "step controller_print_speculative_locks", not just after "step s2_upsert: <... completed>" To be clear, there'd still be an issue about whether the NOTICE is printed before/after the "s2_upsert: <... completed>", but it looks to me the issue is bigger than that. It's easy enough to add another wait in s2_upsert, but that doesn't help if the entire scheduling just continues regardless of there not really being a waiter. Am I missing something here? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > ISTM the issue at hand isn't so much that X expects something to be > printed by Y before it terminates, but that it expects the next step to > not be executed before Y unlocks. If I understand the wrong output > correctly, what happens is that "controller_print_speculative_locks" is > executed, even though s1 hasn't yet acquired the next lock. That's one way to look at it perhaps. I've spent the day fooling around with a re-implementation of isolationtester that waits for all its controlled sessions to quiesce (either wait for client input, or block on a lock held by another session) before moving on to the next step. That was not a feasible approach before we had the wait_event infrastructure, but it's seeming like it might be workable now. Still have a few issues to sort out though ... regards, tom lane
I wrote: > I've spent the day fooling around with a re-implementation of > isolationtester that waits for all its controlled sessions to quiesce > (either wait for client input, or block on a lock held by another > session) before moving on to the next step. That was not a feasible > approach before we had the wait_event infrastructure, but it's > seeming like it might be workable now. Still have a few issues to > sort out though ... I wasted a good deal of time on this idea, and eventually concluded that it's a dead end, because there is an unremovable race condition. Namely, that even if the isolationtester's observer backend has observed that test session X has quiesced according to its wait_event_info, it is possible for the report of that fact to arrive at the isolationtester client process before test session X's output does. It's quite obvious how that might happen if the isolationtester is on a different machine than the PG server --- just imagine a dropped packet in X's output that has to be retransmitted. You might think it shouldn't happen within a single machine, but I'm seeing results that I cannot explain any other way (on an 8-core RHEL8 box). It appears to not be particularly rare, either. > Andres Freund <andres@anarazel.de> writes: >> ISTM the issue at hand isn't so much that X expects something to be >> printed by Y before it terminates, but that it expects the next step to >> not be executed before Y unlocks. If I understand the wrong output >> correctly, what happens is that "controller_print_speculative_locks" is >> executed, even though s1 hasn't yet acquired the next lock. The problem as I'm now understanding it is that insert-conflict-specconflict.spec issues multiple commands in sequence to its "controller" session, and expects that NOTICE outputs from a different test session will arrive at a determinate point in that sequence. In practice that's not guaranteed, because (a) the other test session might not send the NOTICE soon enough --- as my modified specfile proves --- and (b) even if the NOTICE is timely sent, the kernel will not guarantee timely receipt. We could fix (a) by introducing some explicit interlock between the controller and test sessions, but (b) is a killer. I think what we have to do to salvage this test is to get rid of the use of NOTICE outputs, and instead have the test functions insert log records into some table, which we can inspect after the fact to verify that things happened as we expect. regards, tom lane
On 8/24/20 4:42 PM, Andres Freund wrote: > > This test is really hairy, which isn't great. But until we have a proper > framework for controlling server side execution, I am not sure how we > better can achieve test coverage for this stuff. And there've been bugs, > so it's worth testing. > What would the framework look like? cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Let me (rather shamelessly) extract a couple of patches from the
patch set that was already shared in the fault injection framework
proposal [1].
The first patch incorporates a new syntax in isolation spec grammar to
explicitly mark a step that is expected to block (due to reasons other
than locks). E.g.
permutation step1 step2& step3
The “&” suffix indicates that step2 is expected to block and isolation
tester should move on to step3 without waiting for step2 to finish.
The second patch implements the insert-conflict scenario that is being
discussed here - one session waits (using a “suspend” fault) after
inserting a tuple into the heap relation but before updating the
index. Another session concurrently inserts a conflicting tuple in
the heap and the index, and commits. Then the fault is reset so that
the blocked session resumes and detects conflict when updating the
index.
> On 25-Aug-2020, at 9:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
>> I've spent the day fooling around with a re-implementation of
>> isolationtester that waits for all its controlled sessions to quiesce
>> (either wait for client input, or block on a lock held by another
>> session) before moving on to the next step. That was not a feasible
>> approach before we had the wait_event infrastructure, but it's
>> seeming like it might be workable now. Still have a few issues to
>> sort out though ...
>
> I wasted a good deal of time on this idea, and eventually concluded
> that it's a dead end, because there is an unremovable race condition.
> Namely, that even if the isolationtester's observer backend has
> observed that test session X has quiesced according to its
> wait_event_info, it is possible for the report of that fact to arrive
> at the isolationtester client process before test session X's output
> does.
The attached test evades this race condition by not depending on any
output from the blocked session X. It queries status of the injected
fault to ascertain that a specific point in the code was reached
during execution.
>
> I think what we have to do to salvage this test is to get rid of the
> use of NOTICE outputs, and instead have the test functions insert
> log records into some table, which we can inspect after the fact
> to verify that things happened as we expect.
>
+1 to getting rid of NOTICE outputs.
Please refer to https://github.com/asimrp/postgres/tree/faultinjector
for the full patch set proposed in [1] that is now rebased against the
latest master.
patch set that was already shared in the fault injection framework
proposal [1].
The first patch incorporates a new syntax in isolation spec grammar to
explicitly mark a step that is expected to block (due to reasons other
than locks). E.g.
permutation step1 step2& step3
The “&” suffix indicates that step2 is expected to block and isolation
tester should move on to step3 without waiting for step2 to finish.
The second patch implements the insert-conflict scenario that is being
discussed here - one session waits (using a “suspend” fault) after
inserting a tuple into the heap relation but before updating the
index. Another session concurrently inserts a conflicting tuple in
the heap and the index, and commits. Then the fault is reset so that
the blocked session resumes and detects conflict when updating the
index.
> On 25-Aug-2020, at 9:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
>> I've spent the day fooling around with a re-implementation of
>> isolationtester that waits for all its controlled sessions to quiesce
>> (either wait for client input, or block on a lock held by another
>> session) before moving on to the next step. That was not a feasible
>> approach before we had the wait_event infrastructure, but it's
>> seeming like it might be workable now. Still have a few issues to
>> sort out though ...
>
> I wasted a good deal of time on this idea, and eventually concluded
> that it's a dead end, because there is an unremovable race condition.
> Namely, that even if the isolationtester's observer backend has
> observed that test session X has quiesced according to its
> wait_event_info, it is possible for the report of that fact to arrive
> at the isolationtester client process before test session X's output
> does.
The attached test evades this race condition by not depending on any
output from the blocked session X. It queries status of the injected
fault to ascertain that a specific point in the code was reached
during execution.
>
> I think what we have to do to salvage this test is to get rid of the
> use of NOTICE outputs, and instead have the test functions insert
> log records into some table, which we can inspect after the fact
> to verify that things happened as we expect.
>
+1 to getting rid of NOTICE outputs.
Please refer to https://github.com/asimrp/postgres/tree/faultinjector
for the full patch set proposed in [1] that is now rebased against the
latest master.
Attachment
The test material added in commit 43e0841 continues to yield buildfarm failures. Failures new since the rest of this thread: damselfly │ 2021-02-02 10:19:15 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2021-02-02%2010%3A19%3A15 drongo │ 2021-02-05 01:13:10 │ REL_13_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2021-02-05%2001%3A13%3A10 lorikeet │ 2021-03-05 21:30:13 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2021-03-05%2021%3A30%3A13 lorikeet │ 2021-03-16 08:28:36 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2021-03-16%2008%3A28%3A36 macaque │ 2021-03-21 10:14:52 │ REL_13_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=macaque&dt=2021-03-21%2010%3A14%3A52 walleye │ 2021-03-25 05:00:44 │ REL_13_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-03-25%2005%3A00%3A44 sungazer │ 2021-04-23 21:52:31 │ REL_13_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2021-04-23%2021%3A52%3A31 gharial │ 2021-04-30 06:08:36 │ REL_13_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-04-30%2006%3A08%3A36 walleye │ 2021-05-05 17:00:41 │ REL_13_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-05-05%2017%3A00%3A41 gharial │ 2021-05-05 22:35:33 │ REL_13_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-05-05%2022%3A35%3A33 On Tue, Aug 25, 2020 at 12:04:41PM -0400, Tom Lane wrote: > I think what we have to do to salvage this test is to get rid of the > use of NOTICE outputs, and instead have the test functions insert > log records into some table, which we can inspect after the fact > to verify that things happened as we expect. That sounds promising. Are those messages important for observing server bugs, or are they for debugging/modifying the test itself? If the latter, one could just change the messages to LOG. Any of the above won't solve things completely, because 3 of the 21 failures have diffs in the pg_locks output: dory │ 2020-03-14 19:35:31 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-03-14%2019%3A35%3A31 walleye │ 2021-03-25 05:00:44 │ REL_13_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-03-25%2005%3A00%3A44 walleye │ 2021-05-05 17:00:41 │ REL_13_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-05-05%2017%3A00%3A41 Perhaps the pg_locks query should poll until pg_locks has the expected rows. Or else poll until all test sessions are waiting or idle.
Noah Misch <noah@leadboat.com> writes: > The test material added in commit 43e0841 continues to yield buildfarm > failures. Yeah. It's only a relatively small fraction of the total volume of isolation-test failures, so I'm not sure it's worth expending a whole lot of effort on this issue by itself. > On Tue, Aug 25, 2020 at 12:04:41PM -0400, Tom Lane wrote: >> I think what we have to do to salvage this test is to get rid of the >> use of NOTICE outputs, and instead have the test functions insert >> log records into some table, which we can inspect after the fact >> to verify that things happened as we expect. > That sounds promising. Are those messages important for observing server > bugs, or are they for debugging/modifying the test itself? If the latter, one > could just change the messages to LOG. I think they are important, because they show that the things we expect to happen occur when we expect them to happen. I experimented with replacing the RAISE NOTICEs with INSERTs, and ran into two problems: 1. You can't do an INSERT in an IMMUTABLE function. This is easily worked around by putting the INSERT in a separate, volatile function. That's cheating like mad of course, but so is the rest of the stuff this test does in "immutable" functions. 2. The inserted events don't become visible from the outside until the respective session commits. This seems like an absolute show-stopper. After the fact, we can see that the events happened in the expected relative order; but we don't have proof that they happened in the right order relative to the actions visible in the test output file. > ... Any of the above won't solve things > completely, because 3 of the 21 failures have diffs in the pg_locks output: Yeah, it looks like the issue there is that session 2 reports completion of its step before session 1 has a chance to make progress after obtaining the lock. This seems to me to be closely related to the race conditions I described upthread. [ thinks for awhile ... ] I wonder whether we could do better with something along these lines: * Adjust the test script's functions to emit a NOTICE *after* acquiring a lock, not before. * Annotate permutations with something along the lines of "expect N NOTICE outputs before allowing this step to be considered complete", which we'd attach to the unlock steps. This idea is only half baked at present, but maybe there's something to work with there. If it works, maybe we could improve the other test cases that are always pseudo-failing in a similar way. For example, in the deadlock tests, annotate steps with "expect step Y to finish before step X". regards, tom lane
On Sun, Jun 13, 2021 at 04:48:48PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Tue, Aug 25, 2020 at 12:04:41PM -0400, Tom Lane wrote: > >> I think what we have to do to salvage this test is to get rid of the > >> use of NOTICE outputs, and instead have the test functions insert > >> log records into some table, which we can inspect after the fact > >> to verify that things happened as we expect. > > > That sounds promising. Are those messages important for observing server > > bugs, or are they for debugging/modifying the test itself? If the latter, one > > could just change the messages to LOG. > > I think they are important, because they show that the things we expect > to happen occur when we expect them to happen. > > I experimented with replacing the RAISE NOTICEs with INSERTs, and ran > into two problems: > > 1. You can't do an INSERT in an IMMUTABLE function. This is easily > worked around by putting the INSERT in a separate, volatile function. > That's cheating like mad of course, but so is the rest of the stuff > this test does in "immutable" functions. > > 2. The inserted events don't become visible from the outside until the > respective session commits. This seems like an absolute show-stopper. > After the fact, we can see that the events happened in the expected > relative order; but we don't have proof that they happened in the right > order relative to the actions visible in the test output file. One could send the inserts over dblink, I suppose. > > ... Any of the above won't solve things > > completely, because 3 of the 21 failures have diffs in the pg_locks output: > * Adjust the test script's functions to emit a NOTICE *after* acquiring > a lock, not before. I suspect that particular lock acquisition merely unblocks the processing that reaches the final lock state expected by the test. So, ... > * Annotate permutations with something along the lines of "expect N > NOTICE outputs before allowing this step to be considered complete", > which we'd attach to the unlock steps. ... I don't expect this to solve $SUBJECT. It could be a separately-useful feature, though. > This idea is only half baked at present, but maybe there's something > to work with there. If it works, maybe we could improve the other > test cases that are always pseudo-failing in a similar way. For > example, in the deadlock tests, annotate steps with "expect step > Y to finish before step X". Yeah, a special permutation list entry like PQgetResult(s8) could solve failures like http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-06-11%2017%3A13%3A44 Incidentally, I have a different idle patch relevant to deadlock test failures like that. Let me see if it has anything useful.
Noah Misch <noah@leadboat.com> writes: > On Sun, Jun 13, 2021 at 04:48:48PM -0400, Tom Lane wrote: >> * Adjust the test script's functions to emit a NOTICE *after* acquiring >> a lock, not before. > I suspect that particular lock acquisition merely unblocks the processing that > reaches the final lock state expected by the test. So, ... Ah, you're probably right. >> * Annotate permutations with something along the lines of "expect N >> NOTICE outputs before allowing this step to be considered complete", >> which we'd attach to the unlock steps. > ... I don't expect this to solve $SUBJECT. It could be a separately-useful > feature, though. I think it would solve it. In the examples at hand, where we have @@ -377,8 +377,6 @@ pg_advisory_unlock t -s1: NOTICE: blurt_and_lock_123() called for k1 in session 1 -s1: NOTICE: acquiring advisory lock on 2 step s2_upsert: <... completed> step controller_print_speculative_locks: SELECT pa.application_name, locktype, mode, granted and then those notices show up sometime later, I'm hypothesizing that the actions did happen timely, but the actual delivery of those packets to the isolationtester client did not. If we annotated step s2_upsert with a marker to the effect of "wait for 2 NOTICEs from session 1 before considering this step done", we could resolve that race condition. Admittedly, this is putting a thumb on the scales a little bit, but it's hard to see how to deal with inconsistent TCP delivery delays without that. (BTW, I find that removing the pq_flush() call at the bottom of send_message_to_frontend produces this failure and a bunch of other similar ones.) > Yeah, a special permutation list entry like PQgetResult(s8) could solve > failures like > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-06-11%2017%3A13%3A44 Right. I'm visualizing it more like annotating s7a8 as requiring s8a1 to complete first -- or vice versa, either would stabilize that test result I think. We might be able to get rid of the stuff about concurrent step completion in isolationtester.c if we required the spec files to use annotations to force a deterministic step completion order in all such cases. regards, tom lane
On Sun, Jun 13, 2021 at 06:09:20PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Sun, Jun 13, 2021 at 04:48:48PM -0400, Tom Lane wrote: > >> * Adjust the test script's functions to emit a NOTICE *after* acquiring > >> a lock, not before. > > > I suspect that particular lock acquisition merely unblocks the processing that > > reaches the final lock state expected by the test. So, ... > > Ah, you're probably right. > > >> * Annotate permutations with something along the lines of "expect N > >> NOTICE outputs before allowing this step to be considered complete", > >> which we'd attach to the unlock steps. > > > ... I don't expect this to solve $SUBJECT. It could be a separately-useful > > feature, though. > > I think it would solve it. In the examples at hand, where we have > > @@ -377,8 +377,6 @@ > pg_advisory_unlock > > t > -s1: NOTICE: blurt_and_lock_123() called for k1 in session 1 > -s1: NOTICE: acquiring advisory lock on 2 > step s2_upsert: <... completed> > step controller_print_speculative_locks: > SELECT pa.application_name, locktype, mode, granted It would solve that one particular diff. I meant that it wouldn't solve the aforementioned pg_locks diffs: dory │ 2020-03-14 19:35:31 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-03-14%2019%3A35%3A31 walleye │ 2021-03-25 05:00:44 │ REL_13_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-03-25%2005%3A00%3A44 walleye │ 2021-05-05 17:00:41 │ REL_13_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-05-05%2017%3A00%3A41 > We might be able to get rid of the stuff about concurrent step > completion in isolationtester.c if we required the spec files > to use annotations to force a deterministic step completion > order in all such cases. Yeah. If we're willing to task spec authors with that, the test program can't then guess wrong under unusual timing.
Hi, On 2021-06-13 15:22:12 -0700, Noah Misch wrote: > On Sun, Jun 13, 2021 at 06:09:20PM -0400, Tom Lane wrote: > > We might be able to get rid of the stuff about concurrent step > > completion in isolationtester.c if we required the spec files > > to use annotations to force a deterministic step completion > > order in all such cases. > > Yeah. If we're willing to task spec authors with that, the test program can't > then guess wrong under unusual timing. I think it'd make it *easier* for spec authors. Right now one needs to find some way to get a consistent ordering, which is often hard and complicates tests way more than specifying an explicit ordering would. And it's often unreliable, as evidenced here and in plenty other tests. Greetings, Andres Freund
On 14/06/21 11:49 am, Andres Freund wrote: > Hi, > > On 2021-06-13 15:22:12 -0700, Noah Misch wrote: >> On Sun, Jun 13, 2021 at 06:09:20PM -0400, Tom Lane wrote: >>> We might be able to get rid of the stuff about concurrent step >>> completion in isolationtester.c if we required the spec files >>> to use annotations to force a deterministic step completion >>> order in all such cases. >> Yeah. If we're willing to task spec authors with that, the test program can't >> then guess wrong under unusual timing. > I think it'd make it *easier* for spec authors. Right now one needs to > find some way to get a consistent ordering, which is often hard and > complicates tests way more than specifying an explicit ordering > would. And it's often unreliable, as evidenced here and in plenty other > tests. > > Greetings, > > Andres Freund How about adding a keyword like 'DETERMINISTIC' to the top level SELECT, the idea being the output would be deterministic (given the same table values after filtering etc), but not necessarily in any particular order? So pg could decide the optimum way to achieve that which may not necessarily need a sort. Cheers, Gavin
On Sun, Jun 13, 2021 at 04:49:04PM -0700, Andres Freund wrote: > On 2021-06-13 15:22:12 -0700, Noah Misch wrote: > > On Sun, Jun 13, 2021 at 06:09:20PM -0400, Tom Lane wrote: > > > We might be able to get rid of the stuff about concurrent step > > > completion in isolationtester.c if we required the spec files > > > to use annotations to force a deterministic step completion > > > order in all such cases. > > > > Yeah. If we're willing to task spec authors with that, the test program can't > > then guess wrong under unusual timing. > > I think it'd make it *easier* for spec authors. Right now one needs to > find some way to get a consistent ordering, which is often hard and > complicates tests way more than specifying an explicit ordering > would. And it's often unreliable, as evidenced here and in plenty other > tests. Fine with me. Even if it weren't easier for spec authors, it shifts efforts to spec authors and away from buildfarm observers, which is a good thing.