Re: Test of a partition with an incomplete detach has a timing issue - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Test of a partition with an incomplete detach has a timing issue
Date
Msg-id 20210525035642.GA3804869@rfd.leadboat.com
Whole thread Raw
In response to Re: Test of a partition with an incomplete detach has a timing issue  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Test of a partition with an incomplete detach has a timing issue
List pgsql-hackers
On Mon, May 24, 2021 at 09:12:40PM -0400, Tom Lane wrote:
> The experiments I did awhile ago are coming back to me now.  I tried
> a number of variations on this same theme, and none of them closed
> the gap entirely.  The fundamental problem is that it's possible
> for backend A to complete its transaction, and for backend B (which
> is the isolationtester's monitoring session) to observe that A has
> completed its transaction, and for B to report that fact to the
> isolationtester, and for that report to arrive at the isolationtester
> *before A's query result does*.  You need some bad luck for that
> to happen, like A losing the CPU right before it flushes its output
> buffer to the client, but I was able to demonstrate it fairly
> repeatably.

> So a completely bulletproof interlock seems out of reach.

What if we had a standard that the step after the cancel shall send a query to
the backend that just received the cancel?  Something like:

--- a/src/test/isolation/specs/detach-partition-concurrently-3.spec
+++ b/src/test/isolation/specs/detach-partition-concurrently-3.spec
@@ -34,16 +34,18 @@ step "s1describe"    { SELECT 'd3_listp' AS root, * FROM pg_partition_tree('d3_list
 session "s2"
 step "s2begin"        { BEGIN; }
 step "s2snitch"        { INSERT INTO d3_pid SELECT pg_backend_pid(); }
 step "s2detach"        { ALTER TABLE d3_listp DETACH PARTITION d3_listp1 CONCURRENTLY; }
+step "s2noop"        { UNLISTEN noop; }
+# TODO follow every instance of s1cancel w/ s2noop
 step "s2detach2"    { ALTER TABLE d3_listp DETACH PARTITION d3_listp2 CONCURRENTLY; }
 step "s2detachfinal"    { ALTER TABLE d3_listp DETACH PARTITION d3_listp1 FINALIZE; }
 step "s2drop"        { DROP TABLE d3_listp1; }
 step "s2commit"        { COMMIT; }
 
 # Try various things while the partition is in "being detached" state, with
 # no session waiting.
-permutation "s2snitch" "s1b" "s1s" "s2detach" "s1cancel" "s1c" "s1describe" "s1alter"
+permutation "s2snitch" "s1b" "s1s" "s2detach" "s1cancel" "s2noop" "s1c" "s1describe" "s1alter"
 permutation "s2snitch" "s1b" "s1s" "s2detach" "s1cancel" "s1insert" "s1c"
 permutation "s2snitch" "s1brr" "s1s" "s2detach" "s1cancel" "s1insert" "s1c" "s1spart"
 permutation "s2snitch" "s1b" "s1s" "s2detach" "s1cancel" "s1c" "s1insertpart"
 



pgsql-hackers by date:

Previous
From: Ajin Cherian
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: Greg Nancarrow
Date:
Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump