Re: Why is subscription/t/031_column_list.pl failing so much? - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: Why is subscription/t/031_column_list.pl failing so much?
Date
Msg-id a1010953-7d65-225a-0afa-f11de5d5e011@gmail.com
Whole thread Raw
In response to Re: Why is subscription/t/031_column_list.pl failing so much?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Why is subscription/t/031_column_list.pl failing so much?
List pgsql-hackers
Hello Tom and Noah,

03.02.2024 04:24, Tom Lane wrote:
> I'm still wondering how come the failure seems to have suddenly gotten
> way more common.  The only changes that are in vaguely-related places
> and fit the time frame are Amit's 732924043 and 776621a5e, but I sure
> don't see a connection.

I think the failure rate increased due to tamandua, calliphoridae,
flaviventris, and kestrel were switched from make to meson recently.
The last `make` builds for them:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-01-31%2016%3A51%3A31
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-01-31%2016%3A51%3A38
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2024-01-31%2016%3A52%3A37
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-01-31%2016%3A51%3A53

and since that switch the 031_column_list duration increased significantly,
e.g., on the last tamandua `make` run it executed for 7 seconds, but
successful `meson` runs give much longer duration:
280/283 postgresql:subscription / subscription/031_column_list                    OK 38.27s   36 subtests passed
280/283 postgresql:subscription / subscription/031_column_list                    OK 126.13s   36 subtests passed
280/283 postgresql:subscription / subscription/031_column_list                    OK 31.93s   36 subtests passed
279/283 postgresql:subscription / subscription/031_column_list                    OK 99.76s   36 subtests passed

So, looking at the tamandua's failure log, I see:
2024-02-02 19:41:19.750 UTC [1579219][postmaster][:0] LOG:  starting PostgreSQL 17devel on x86_64-linux, compiled by 
gcc-12.3.0, 64-bit
...
2024-02-02 19:42:19.973 UTC [1629333][client backend][4/213:0] LOG: statement: ALTER SUBSCRIPTION sub1 SET PUBLICATION
pub7
2024-02-02 19:42:20.131 UTC [1625765][logical replication apply worker][3/122:0] LOG:  logical replication worker for 
subscription "sub1" will restart because of a parameter change
2024-02-02 19:42:20.137 UTC [1629333][client backend][:0] LOG: disconnection: session time: 0:00:00.212 user=bf 
database=postgres host=[local]
2024-02-02 19:42:20.191 UTC [1629535][logical replication apply worker][3/124:0] LOG:  logical replication apply worker

for subscription "sub1" has started
...
2024-02-02 19:42:20.445 UTC [1629535][logical replication apply worker][3/0:0] ERROR:  could not receive data from WAL

stream: ERROR:  publication "pub7" does not exist
     CONTEXT:  slot "sub1", output plugin "pgoutput", in the change callback, associated LSN 0/15C7698

(The interval between subscriber start and the error is ~ 4 * 15 seconds.)

Thus it still may be explained by bgwriter activity, though perhaps
autovacuum/checkpointer can add something as well.

Best regards,
Alexander



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Documentation: warn about two_phase when altering a subscription
Next
From: Hannu Krosing
Date:
Subject: Re: pgbench - adding pl/pgsql versions of tests