false failure of test_docoding regression test - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject false failure of test_docoding regression test
Date
Msg-id 20220304.113347.2105652521035137491.horikyota.ntt@gmail.com
Whole thread Raw
List pgsql-hackers
Hello.

The CF-CI complained on one of my patch for seemingly a reason
unrelated to the patch.

https://cirrus-ci.com/task/5544213843542016?logs=test_world#L1666

> diff -U3 /tmp/cirrus-ci-build/contrib/test_decoding/expected/slot_creation_error.out
/tmp/cirrus-ci-build/contrib/test_decoding/output_iso/results/slot_creation_error.out
> --- /tmp/cirrus-ci-build/contrib/test_decoding/expected/slot_creation_error.out    2022-03-03 22:45:04.708072000
+0000
> +++ /tmp/cirrus-ci-build/contrib/test_decoding/output_iso/results/slot_creation_error.out    2022-03-03
22:54:49.621351000+0000
 
> @@ -96,13 +96,13 @@
>  t                   
>  (1 row)
>  
> +step s1_c: COMMIT;
>  step s2_init: <... completed>
>  FATAL:  terminating connection due to administrator command
>  server closed the connection unexpectedly
>      This probably means the server terminated abnormally
>      before or while processing the request.
>  
> -step s1_c: COMMIT;
>  step s1_view_slot: 
>      SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error'

This comes from the permuattion 'permutation s1_b s1_xid s2_init
s1_terminate_s2 s1_c s1_view_slot'.  That means the process
termination by s1_terminate_s2 is a bit delayed until the next s1_c
ends.  So it is rare false failure but it is annoying enough on the
CI.  It seems to me we need to wait for process termination at the
time. postgres_fdw does that in regression test.

Thoughts?

Simliar use is found in temp-schema-cleanup. There's another possible
instability between s2_advisory and s2_check_schema but this change
alone reduces the chance for false failures.

The attached fixes the both points.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 82cdee92c3726a7f248849a310ec3f392ba02384 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 4 Mar 2022 11:03:12 +0900
Subject: [PATCH] Wait for process termination during isolation tests

slot_creation_error.spec and temp-schema-cleanup.spec used
pg_terminate_backend() without specifying timeout, thus there may be a
case of proceeding to the next step before the process actually
terminates then false failure.

Supply timeout to pg_terminate_backend() so that it waits for process
termination to avoid that failure mode.
---
 contrib/test_decoding/expected/slot_creation_error.out | 2 +-
 contrib/test_decoding/specs/slot_creation_error.spec   | 2 +-
 src/test/isolation/expected/temp-schema-cleanup.out    | 2 +-
 src/test/isolation/specs/temp-schema-cleanup.spec      | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/contrib/test_decoding/expected/slot_creation_error.out
b/contrib/test_decoding/expected/slot_creation_error.out
index 043bdae0a2..3707482cb8 100644
--- a/contrib/test_decoding/expected/slot_creation_error.out
+++ b/contrib/test_decoding/expected/slot_creation_error.out
@@ -87,7 +87,7 @@ step s2_init:
     SELECT 'init' FROM pg_create_logical_replication_slot('slot_creation_error', 'test_decoding');
  <waiting ...>
 step s1_terminate_s2: 
-    SELECT pg_terminate_backend(pid)
+    SELECT pg_terminate_backend(pid, 180000)
     FROM pg_stat_activity
     WHERE application_name = 'isolation/slot_creation_error/s2';
 
diff --git a/contrib/test_decoding/specs/slot_creation_error.spec
b/contrib/test_decoding/specs/slot_creation_error.spec
index 6816696b9d..32161b9e7f 100644
--- a/contrib/test_decoding/specs/slot_creation_error.spec
+++ b/contrib/test_decoding/specs/slot_creation_error.spec
@@ -13,7 +13,7 @@ step s1_cancel_s2 {
 }
 
 step s1_terminate_s2 {
-    SELECT pg_terminate_backend(pid)
+    SELECT pg_terminate_backend(pid, 180000)
     FROM pg_stat_activity
     WHERE application_name = 'isolation/slot_creation_error/s2';
 }
diff --git a/src/test/isolation/expected/temp-schema-cleanup.out b/src/test/isolation/expected/temp-schema-cleanup.out
index 35b91d9e45..cb4302739a 100644
--- a/src/test/isolation/expected/temp-schema-cleanup.out
+++ b/src/test/isolation/expected/temp-schema-cleanup.out
@@ -83,7 +83,7 @@ exec
 (1 row)
 
 step s1_exit: 
-    SELECT pg_terminate_backend(pg_backend_pid());
+    SELECT pg_terminate_backend(pg_backend_pid(), 180000);
 
 FATAL:  terminating connection due to administrator command
 server closed the connection unexpectedly
diff --git a/src/test/isolation/specs/temp-schema-cleanup.spec b/src/test/isolation/specs/temp-schema-cleanup.spec
index a9417b7e90..f0d3928996 100644
--- a/src/test/isolation/specs/temp-schema-cleanup.spec
+++ b/src/test/isolation/specs/temp-schema-cleanup.spec
@@ -47,7 +47,7 @@ step s1_discard_temp {
 }
 
 step s1_exit {
-    SELECT pg_terminate_backend(pg_backend_pid());
+    SELECT pg_terminate_backend(pg_backend_pid(), 180000);
 }
 
 
-- 
2.27.0


pgsql-hackers by date:

Previous
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Add the replication origin name and commit-LSN to logical replication worker errcontext
Next
From: Peter Smith
Date:
Subject: Re: PG DOCS - logical replication filtering