Re: Flaky vacuum truncate test in reloptions.sql - Mailing list pgsql-hackers

From Arseny Sher
Subject Re: Flaky vacuum truncate test in reloptions.sql
Date
Msg-id 87o8etrfae.fsf@ars-thinkpad
Whole thread Raw
In response to Re: Flaky vacuum truncate test in reloptions.sql  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Flaky vacuum truncate test in reloptions.sql
List pgsql-hackers
On Fri, Apr 2, 2021 at 9:46 AM Michael Paquier <michael@paquier.xyz> wrote:

> Okay, applied and back-patched down to 12 then.

Thank you both. Unfortunately and surprisingly, the test still fails
(perhaps even rarer, once in several hundred runs) under
multimaster. After scratching the head for some more time, it seems to
me the following happens: not only vacuum encounters locked page, but
also there exist a concurrent backend (as the parallel schedule is run)
who holds back oldestxmin keeping it less than xid of transaction which
did the insertion

INSERT INTO reloptions_test VALUES (1, NULL), (NULL, NULL);

FreezeLimit can't be higher than oldestxmin, so lazy_check_needs_freeze
decides there is nothing to freeze on the page. multimaster commits are
quite heavy, which apparently shifts the timings making the issue more
likely.

Currently we are testing the rather funny attached patch which forces
all such old-snapshot-holders to finish. It is crutchy, but I doubt we
want to change vacuum logic (e.g. checking tuple liveness in
lazy_check_needs_freeze) due to this issue. (it is especially crutchy in
xid::bigint casts, but wraparound is hardly expected in regression tests
run).


-- cheers, arseny

diff --git a/src/test/regress/expected/reloptions.out b/src/test/regress/expected/reloptions.out
index bb7bd6e1e7e..78bbf4a5255 100644
--- a/src/test/regress/expected/reloptions.out
+++ b/src/test/regress/expected/reloptions.out
@@ -128,6 +128,20 @@ SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass;
 INSERT INTO reloptions_test VALUES (1, NULL), (NULL, NULL);
 ERROR:  null value in column "i" of relation "reloptions_test" violates not-null constraint
 DETAIL:  Failing row contains (null, null).
+do $$
+declare
+  my_xid bigint;
+  oldest_xmin bigint;
+begin
+  my_xid := txid_current();
+  while true loop
+    oldest_xmin := min(backend_xmin::text::bigint) from pg_stat_activity where pid != pg_backend_pid();
+    exit when oldest_xmin is null or oldest_xmin >= my_xid;
+    perform pg_sleep(0.1);
+    perform pg_stat_clear_snapshot();
+  end loop;
+end
+$$;
 -- Do an aggressive vacuum to prevent page-skipping.
 VACUUM FREEZE reloptions_test;
 SELECT pg_relation_size('reloptions_test') = 0;
diff --git a/src/test/regress/sql/reloptions.sql b/src/test/regress/sql/reloptions.sql
index 95f7ab4189e..96fb59d16ad 100644
--- a/src/test/regress/sql/reloptions.sql
+++ b/src/test/regress/sql/reloptions.sql
@@ -72,6 +72,20 @@ SELECT reloptions FROM pg_class WHERE oid =
 ALTER TABLE reloptions_test RESET (vacuum_truncate);
 SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass;
 INSERT INTO reloptions_test VALUES (1, NULL), (NULL, NULL);
+do $$
+declare
+  my_xid bigint;
+  oldest_xmin bigint;
+begin
+  my_xid := txid_current();
+  while true loop
+    oldest_xmin := min(backend_xmin::text::bigint) from pg_stat_activity where pid != pg_backend_pid();
+    exit when oldest_xmin is null or oldest_xmin >= my_xid;
+    perform pg_sleep(0.1);
+    perform pg_stat_clear_snapshot();
+  end loop;
+end
+$$;
 -- Do an aggressive vacuum to prevent page-skipping.
 VACUUM FREEZE reloptions_test;
 SELECT pg_relation_size('reloptions_test') = 0;

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: policies with security definer option for allowing inline optimization
Next
From: Alvaro Herrera
Date:
Subject: Re: Autovacuum on partitioned table (autoanalyze)