Re: Test of a partition with an incomplete detach has a timing issue - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Test of a partition with an incomplete detach has a timing issue
Date
Msg-id 20210524180712.GA13311@alvherre.pgsql
Whole thread Raw
In response to Test of a partition with an incomplete detach has a timing issue  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Responses Re: Test of a partition with an incomplete detach has a timing issue
RE: Test of a partition with an incomplete detach has a timing issue
Re: Test of a partition with an incomplete detach has a timing issue
List pgsql-hackers
On 2021-May-24, osumi.takamichi@fujitsu.com wrote:

> Also, I've gotten some logs left.
> * src/test/isolation/output_iso/regression.out
> 
> test detach-partition-concurrently-1 ... ok          682 ms
> test detach-partition-concurrently-2 ... ok          321 ms
> test detach-partition-concurrently-3 ... FAILED     1084 ms
> test detach-partition-concurrently-4 ... ok         1078 ms
> test fk-contention                ... ok           77 ms
> 
> * src/test/isolation/output_iso/regression.diffs
> 
> diff -U3 /(where/I/put/PG)/src/test/isolation/expected/detach-partition-concurrently-3.out
/(where/I/put/PG)/src/test/isolation/output_iso/results/detach-partition-concurrently-3.out
> --- /(where/I/put/PG)/src/test/isolation/expected/detach-partition-concurrently-3.out     2021-05-24
03:30:15.735488295+0000
 
> +++ /(where/I/put/PG)/src/test/isolation/output_iso/results/detach-partition-concurrently-3.out   2021-05-24
04:46:48.851488295+0000
 
> @@ -12,9 +12,9 @@
>  pg_cancel_backend
>  
>  t              
> -step s2detach: <... completed>
> -error in steps s1cancel s2detach: ERROR:  canceling statement due to user request
>  step s1c: COMMIT;
> +step s2detach: <... completed>
> +error in steps s1c s2detach: ERROR:  canceling statement due to user request

Uh, how annoying.  If I understand correctly, I agree that this is a
timing issue: sometimes it is fast enough that the cancel is reported
together with its own step, but other times it takes longer so it is
reported with the next command of that session instead, s1c (commit).

I suppose a fix would imply that the error report waits until after the
"cancel" step is over, but I'm not sure how to do that.

Maybe we can change the "cancel" query to something like

SELECT pg_cancel_backend(pid), somehow_wait_for_detach_to_terminate() FROM d3_pid;

... where maybe that function can check the "state" column in s3's
pg_stat_activity row?  I'll give that a try.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"That sort of implies that there are Emacs keystrokes which aren't obscure.
I've been using it daily for 2 years now and have yet to discover any key
sequence which makes any sense."                        (Paul Thomas)



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Commitfest app vs. pgsql-docs
Next
From: Alvaro Herrera
Date:
Subject: Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options