Re: BF animal dikkop reported a failure in 035_standby_logical_decoding - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: BF animal dikkop reported a failure in 035_standby_logical_decoding
Date
Msg-id 7bd3344a-e414-a2a3-5c87-a59122545292@gmail.com
Whole thread Raw
In response to Re: BF animal dikkop reported a failure in 035_standby_logical_decoding  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BF animal dikkop reported a failure in 035_standby_logical_decoding
List pgsql-hackers
Hi,

On 5/29/23 1:03 PM, Tom Lane wrote:
> "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> writes:
>> On 5/26/23 9:27 AM, Yu Shi (Fujitsu) wrote:
>>> Is it possible that the vacuum command didn't remove tuples and then the
>>> conflict was not triggered?
> 
>> The flush_wal table added by Andres should guarantee that the WAL is flushed, so
>> the only reason I can think about is indeed that the vacuum did not remove tuples (
>> but I don't get why/how that could be the case).
> 
> This test is broken on its face:
> 
>    CREATE TABLE conflict_test(x integer, y text);
>    DROP TABLE conflict_test;
>    VACUUM full pg_class;
> 
> There will be something VACUUM can remove only if there were no other
> transactions holding back global xmin --- and there's not even a delay
> here to give any such transaction a chance to finish.
> 
> Background autovacuum is the most likely suspect for breaking that,

Oh right, I did not think autovacuum could start during this test, but yeah there
is no reasons it could not.

> but I wouldn't be surprised if something in the logical replication
> mechanism itself could be running a transaction at the wrong instant.
> 
> Some of the other recovery tests set
> autovacuum = off
> to try to control such problems, but I'm not sure how much of
> a solution that really is.

One option I can think of is to:

1) set autovacuum = off (as it looks like the usual suspect).
2) trigger the vacuum in verbose mode (as suggested by Shi-san) and
depending of its output run the "invalidation" test or: re-launch the vacuum, re-check the output
and so on.. (n times max). If n is reached, then skip this test.

As this test is currently failing randomly (and it looks like there is more success that failures, even without
autovacuum = off), then the test should still validate that the invalidation works as expected for the large
majority of runs (and skipping the test should be pretty rare then).

Would that make sense?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: "Anton A. Melnikov"
Date:
Subject: Re: Making Vars outer-join aware
Next
From: vignesh C
Date:
Subject: Re: Support logical replication of DDLs