Re: Stabilizing the test_decoding checks, take N - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Stabilizing the test_decoding checks, take N
Date
Msg-id CAA4eK1LCDapkR8ckRNCkvURGKsM8AvAHZEtRab_98teY5H_Z+g@mail.gmail.com
Whole thread Raw
In response to Stabilizing the test_decoding checks, take N  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Stabilizing the test_decoding checks, take N  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Sat, Apr 16, 2022 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> My pet dinosaur prairiedog just failed in the contrib/test_decoding
> tests [1]:
>
> diff -U3 /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out
/Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out
> --- /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out 2022-04-15 07:59:17.000000000
-0400
> +++ /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out  2022-04-15 09:06:36.000000000
-0400
> @@ -77,10 +77,12 @@
>   streaming change for transaction
>   streaming change for transaction
>   streaming change for transaction
> + closing a streamed block for transaction
> + opening a streamed block for transaction
>   streaming change for transaction
>   closing a streamed block for transaction
>   committing streamed transaction
> -(13 rows)
> +(15 rows)
>
> Looking at the postmaster log, it's obvious where this extra transaction
> came from: auto-analyze ran on pg_type concurrently with the test step
> just before this one.  That could only happen if the tests ran long enough
> for autovacuum_naptime to elapse, but prairiedog is a pretty slow machine.
> (And I hasten to point out that some other animals, such as those running
> valgrind or CLOBBER_CACHE_ALWAYS, are even slower.)
>
> We've seen this sort of problem before [2], and attempted to fix it [3]
> by making these tests ignore empty transactions.  But of course
> auto-analyze's transaction wasn't empty, so that didn't help.
>

The possible reason here is that this extra (auto-analyze) transaction
causes the logical decoding work mem to reach before the last change
of the test's transaction. As can be seen from the logs, it just
closed the stream before the last change and then opened a new stream
for the last change. Now, it is true that the auto-analyze changes
won't be decoded as they don't perform DML operation on any
non-catalog table but it could generate some invalidation message
which needs to be processed even though we won't send anything related
to it to the downstream.

This needs to be verified once by doing some manual testing as it may
not be easily reproducible every time. If this happens to be true then
I think your suggestion related to increasing autovacuum_naptime would
work.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman
Next
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors