Getting rid of intermittent PPC64 buildfarm failures - Mailing list pgsql-hackers

From Tom Lane
Subject Getting rid of intermittent PPC64 buildfarm failures
Date
Msg-id 3479046.1602607848@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
I grow quite weary of the number of buildfarm failures we see as a
consequence of the Linux PPC64 bug discussed in [1].  Although we can
anticipate that the fix will roll out into new kernel builds before much
longer, that will have very little effect on the buildfarm situation,
given that a lot of Mark's PPC64 armada is running hoary "stable" kernels.
It might be many years before there are no unpatched systems to worry about.

I think it's time to give up and disable the infinite_recurse test on such
platforms.  It's teaching us nothing and we waste valuable developer time
eyeballing failures to make sure they're just the same old same old.
Testing the case on not-Linux-PPC64 is enough to verify that our own code
works.

We can use the same technique used in collate.linux.utf8.sql,
namely check the output of version() and abandon the test if it matches.
To minimize the maintenance pain from needing two expected-files, it seems
prudent to split infinite_recurse into its own test script, which leads
to the attached proposed patch.

Any objections?

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/20190723162703.GM22387%40telsasoft.com

diff --git a/src/test/regress/expected/errors.out b/src/test/regress/expected/errors.out
index a525aa2f93..1e7b5a7046 100644
--- a/src/test/regress/expected/errors.out
+++ b/src/test/regress/expected/errors.out
@@ -440,13 +440,3 @@ NULL);
 ERROR:  syntax error at or near "NUL"
 LINE 16: ...L, id2 TEXT NOT NULL PRIMARY KEY, id3 INTEGER NOT NUL, id4 I...
                                                               ^
--- Check that stack depth detection mechanism works and
--- max_stack_depth is not set too high.  The full error report is not
--- very stable, so show only SQLSTATE and primary error message.
-create function infinite_recurse() returns int as
-'select infinite_recurse()' language sql;
-\set VERBOSITY sqlstate
-select infinite_recurse();
-ERROR:  54001
-\echo :LAST_ERROR_MESSAGE
-stack depth limit exceeded
diff --git a/src/test/regress/expected/infinite_recurse.out b/src/test/regress/expected/infinite_recurse.out
new file mode 100644
index 0000000000..90f9631b24
--- /dev/null
+++ b/src/test/regress/expected/infinite_recurse.out
@@ -0,0 +1,24 @@
+-- Check that stack depth detection mechanism works and
+-- max_stack_depth is not set too high.
+create function infinite_recurse() returns int as
+'select infinite_recurse()' language sql;
+-- Unfortunately, up till mid 2020 the Linux kernel had a bug in PPC64
+-- signal handling that would cause this test to crash if it happened
+-- to receive an sinval catchup interrupt while the stack is deep:
+-- https://bugzilla.kernel.org/show_bug.cgi?id=205183
+-- It is likely to be many years before that bug disappears from all
+-- production kernels, so disable this test on such platforms.
+-- (We still create the function, so as not to have a cross-platform
+-- difference in the end state of the regression database.)
+SELECT version() ~ 'powerpc64.*-linux-gnu'
+       AS skip_test \gset
+\if :skip_test
+\quit
+\endif
+-- The full error report is not very stable, so we show only SQLSTATE
+-- and primary error message.
+\set VERBOSITY sqlstate
+select infinite_recurse();
+ERROR:  54001
+\echo :LAST_ERROR_MESSAGE
+stack depth limit exceeded
diff --git a/src/test/regress/expected/infinite_recurse_1.out b/src/test/regress/expected/infinite_recurse_1.out
new file mode 100644
index 0000000000..ef2c8d66a7
--- /dev/null
+++ b/src/test/regress/expected/infinite_recurse_1.out
@@ -0,0 +1,16 @@
+-- Check that stack depth detection mechanism works and
+-- max_stack_depth is not set too high.
+create function infinite_recurse() returns int as
+'select infinite_recurse()' language sql;
+-- Unfortunately, up till mid 2020 the Linux kernel had a bug in PPC64
+-- signal handling that would cause this test to crash if it happened
+-- to receive an sinval catchup interrupt while the stack is deep:
+-- https://bugzilla.kernel.org/show_bug.cgi?id=205183
+-- It is likely to be many years before that bug disappears from all
+-- production kernels, so disable this test on such platforms.
+-- (We still create the function, so as not to have a cross-platform
+-- difference in the end state of the regression database.)
+SELECT version() ~ 'powerpc64.*-linux-gnu'
+       AS skip_test \gset
+\if :skip_test
+\quit
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..ae89ed7f0b 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -55,7 +55,7 @@ test: create_index create_index_spgist create_view index_including index_includi
 # ----------
 # Another group of parallel tests
 # ----------
-test: create_aggregate create_function_3 create_cast constraints triggers select inherit typed_table vacuum
drop_if_existsupdatable_views roleattributes create_am hash_func errors 
+test: create_aggregate create_function_3 create_cast constraints triggers select inherit typed_table vacuum
drop_if_existsupdatable_views roleattributes create_am hash_func errors infinite_recurse 

 # ----------
 # sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..525bdc804f 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -83,6 +83,7 @@ test: roleattributes
 test: create_am
 test: hash_func
 test: errors
+test: infinite_recurse
 test: sanity_check
 test: select_into
 test: select_distinct
diff --git a/src/test/regress/sql/errors.sql b/src/test/regress/sql/errors.sql
index 86b672538a..66a56b28f6 100644
--- a/src/test/regress/sql/errors.sql
+++ b/src/test/regress/sql/errors.sql
@@ -364,12 +364,3 @@ INT4
 UNIQUE
 NOT
 NULL);
-
--- Check that stack depth detection mechanism works and
--- max_stack_depth is not set too high.  The full error report is not
--- very stable, so show only SQLSTATE and primary error message.
-create function infinite_recurse() returns int as
-'select infinite_recurse()' language sql;
-\set VERBOSITY sqlstate
-select infinite_recurse();
-\echo :LAST_ERROR_MESSAGE
diff --git a/src/test/regress/sql/infinite_recurse.sql b/src/test/regress/sql/infinite_recurse.sql
new file mode 100644
index 0000000000..ff4bf966f9
--- /dev/null
+++ b/src/test/regress/sql/infinite_recurse.sql
@@ -0,0 +1,29 @@
+-- Check that stack depth detection mechanism works and
+-- max_stack_depth is not set too high.
+
+create function infinite_recurse() returns int as
+'select infinite_recurse()' language sql;
+
+-- Unfortunately, up till mid 2020 the Linux kernel had a bug in PPC64
+-- signal handling that would cause this test to crash if it happened
+-- to receive an sinval catchup interrupt while the stack is deep:
+-- https://bugzilla.kernel.org/show_bug.cgi?id=205183
+-- It is likely to be many years before that bug disappears from all
+-- production kernels, so disable this test on such platforms.
+-- (We still create the function, so as not to have a cross-platform
+-- difference in the end state of the regression database.)
+
+SELECT version() ~ 'powerpc64.*-linux-gnu'
+       AS skip_test \gset
+\if :skip_test
+\quit
+\endif
+
+-- The full error report is not very stable, so we show only SQLSTATE
+-- and primary error message.
+
+\set VERBOSITY sqlstate
+
+select infinite_recurse();
+
+\echo :LAST_ERROR_MESSAGE

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: lost replication slots after pg_upgrade
Next
From: Bruce Momjian
Date:
Subject: Re: lost replication slots after pg_upgrade