Re: Stabilizing the test_decoding checks, take N - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Stabilizing the test_decoding checks, take N
Date
Msg-id CAFiTN-vFrtusp4hYC+e06EAhK3OxZrr48d9VxXMmJkQi2ZFeNA@mail.gmail.com
Whole thread Raw
In response to Re: Stabilizing the test_decoding checks, take N  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Stabilizing the test_decoding checks, take N  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Mon, Apr 18, 2022 at 11:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Apr 16, 2022 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> My pet dinosaur prairiedog just failed in the contrib/test_decoding
> tests [1]:
>
> diff -U3 /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out
> --- /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out 2022-04-15 07:59:17.000000000 -0400
> +++ /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out  2022-04-15 09:06:36.000000000 -0400
> @@ -77,10 +77,12 @@
>   streaming change for transaction
>   streaming change for transaction
>   streaming change for transaction
> + closing a streamed block for transaction
> + opening a streamed block for transaction
>   streaming change for transaction
>   closing a streamed block for transaction
>   committing streamed transaction
> -(13 rows)
> +(15 rows)
>
> Looking at the postmaster log, it's obvious where this extra transaction
> came from: auto-analyze ran on pg_type concurrently with the test step
> just before this one.  That could only happen if the tests ran long enough
> for autovacuum_naptime to elapse, but prairiedog is a pretty slow machine.
> (And I hasten to point out that some other animals, such as those running
> valgrind or CLOBBER_CACHE_ALWAYS, are even slower.)
>
> We've seen this sort of problem before [2], and attempted to fix it [3]
> by making these tests ignore empty transactions.  But of course
> auto-analyze's transaction wasn't empty, so that didn't help.
>

The possible reason here is that this extra (auto-analyze) transaction
causes the logical decoding work mem to reach before the last change
of the test's transaction. As can be seen from the logs, it just
closed the stream before the last change and then opened a new stream
for the last change. Now, it is true that the auto-analyze changes
won't be decoded as they don't perform DML operation on any
non-catalog table but it could generate some invalidation message
which needs to be processed even though we won't send anything related
to it to the downstream.

This analysis seems right to me.
 
This needs to be verified once by doing some manual testing as it may
not be easily reproducible every time. If this happens to be true then
I think your suggestion related to increasing autovacuum_naptime would
work.

I will try to reproduce this, maybe by reducing the autovacuum_naptime or parallelly running some script that continuously performs DDL-only transactions.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Skipping schema changes in publication
Next
From: Thomas Munro
Date:
Subject: Re: Crash in new pgstats code