Stabilizing the test_decoding checks, take N - Mailing list pgsql-hackers

From Tom Lane
Subject Stabilizing the test_decoding checks, take N
Date
Msg-id 1958043.1650129119@sss.pgh.pa.us
Whole thread Raw
Responses Re: Stabilizing the test_decoding checks, take N
Re: Stabilizing the test_decoding checks, take N
List pgsql-hackers
My pet dinosaur prairiedog just failed in the contrib/test_decoding
tests [1]:

diff -U3 /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out
/Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out
--- /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out    2022-04-15
07:59:17.000000000-0400 
+++ /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out    2022-04-15 09:06:36.000000000
-0400
@@ -77,10 +77,12 @@
  streaming change for transaction
  streaming change for transaction
  streaming change for transaction
+ closing a streamed block for transaction
+ opening a streamed block for transaction
  streaming change for transaction
  closing a streamed block for transaction
  committing streamed transaction
-(13 rows)
+(15 rows)

Looking at the postmaster log, it's obvious where this extra transaction
came from: auto-analyze ran on pg_type concurrently with the test step
just before this one.  That could only happen if the tests ran long enough
for autovacuum_naptime to elapse, but prairiedog is a pretty slow machine.
(And I hasten to point out that some other animals, such as those running
valgrind or CLOBBER_CACHE_ALWAYS, are even slower.)

We've seen this sort of problem before [2], and attempted to fix it [3]
by making these tests ignore empty transactions.  But of course
auto-analyze's transaction wasn't empty, so that didn't help.

I think the most expedient way to prevent this type of failure is to run
the test_decoding tests with autovacuum_naptime cranked up so far as to
make it a non-issue, like maybe a day.  Since test_decoding already adds
some custom settings to postgresql.conf, this'll take just a one-line
addition to test_decoding/logical.conf.

I wonder whether we ought to then revert these tests' use of
skip-empty-xacts, or at least start having a mix of cases.
It seems to me that we'd rather know about it if there are unexpected
empty transactions.  Is there anything we're using that for other than
to hide the effects of autovacuum?

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2022-04-15%2011%3A59%3A16

[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-12%2010%3A24%3A22

[3] https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=b779d7d8fdae088d70da5ed9fcd8205035676df3



pgsql-hackers by date:

Previous
From: dl x
Date:
Subject: GSoC: pgmoneta: Write-Ahead Log (WAL) infrastructure (2022)
Next
From: Andres Freund
Date:
Subject: Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508